Search

Article Contents

Next Previous

ARTICLE Open Access

Genome reannotation of the sweetpotato (Ipomoea batatas (L.) Lam.) using extensive Nanopore and Illumina-based RNA-seq datasets

1.
School of Breeding and Multiplication (Sanya Institute of Breeding and Multiplication), Hainan University, Sanya 572025, China
2.
Key Laboratory for Quality Regulation of Tropical Horticultural Crops of Hainan Province, School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
^# Authors contributed equally: Bei Liang, Yang Zhou

More Information

Corresponding authors: yplee614@hainanu.edu.cn; zhuguopeng@hainanu.edu.cn

Received: 21 October 2023
Revised: 20 January 2024
Accepted: 23 January 2024
Published online: 21 March 2024
Tropical Plants 3, Article number: e008 (2024) | Cite this article

Highlights

The updated annotation, named v1.0.a2, includes 42,751 gene models, with 97.4% complete BUSCOs.

The updated annotation have modified or added 31,771 gene models and identified 8,736 genes with alternatively spliced isoforms.

We have introduced a new gene ID nomenclature (IbXXGXXXXX) as an improvement over the previous nomenclature (gene.gXXXXX).
Abstract

Sweetpotato (Ipomoea batatas (L.) Lam.) is a globally cultivated root crop of paramount significance. The hexaploid genome, known as 'Taizhong 6', has been sequenced and serves as a crucial reference genome for sweetpotato and related species within the Convolvulaceae family. However, the current annotation of the sweetpotato genome relies primarily on ab initio predictions and, to a lesser extent, transcriptome datasets, which only predict coding sequences. Therefore, an improved annotation is highly desirable. Here, we present a comprehensive reannotation of the sweetpotato genome, leveraging 12 Nanopore full-length RNA libraries and 190 Illumina RNA-seq libraries. The improved annotation, named v1.0.a2, includes 42,751 gene models, with 97.4% complete BUSCOs. Within this comprehensive set of gene models, we have modified or added 31,771 gene models and identified 8,736 genes with alternatively spliced isoforms. We have also introduced a new gene ID nomenclature (IbXXGXXXXX) as an improvement over the previous nomenclature (gene.gXXXXX). Additionally, we have annotated and provided expression levels of miRNAs and their targets at different storage roots stages. Overall, our study contributes to an updated genome annotation for the sweetpotato genome, which will significantly facilitate gene functional studies in sweetpotato and promote genomic analyses across the Convolvulaceae family.

Graphical Abstract
- Sweetpotato,
- Genome annotation,
- Illumina-RNAseq,
- Nanopore-RNAseq,
- Manual curation

Supplementary information

Supplemental Table S1 RNA-seq datasets used in this study.
Supplemental Table S2 The Locus IDs for modified genes in both v1.0.a1 and v1.0.2a.
Supplemental Table S3 GO, KEGG and functional annotation of annotated genes.
Supplemental Table S4 Comparison of transcription factors and protein kinases betweem v1.0.a1 and v1.0.a2.
Supplemental Table S5 Primers used for verification of new Gene models
Supplemental Table S6 Expression profiles of annotated miRNA across different stages of storage roots.
Supplemental Table S7 Predicted target genes of miRNAs in Taizhong 6 genome.
Supplemental Fig. S1 The alignment of Sanger sequencing and the latest annotations (v1.0.a2) was performed to compare the two sequences.

Rights and permissions
Copyright: © 2024 by the author(s). Published by Maximum Academic Press on behalf of Hainan University. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Ozias-Akins P, Jarret RL. 1994. Nuclear DNA content and ploidy levels in the genus Ipomoea. Journal of the American Society for Horticultural Science 119:110−15 doi: 10.21273/JASHS.119.1.110 CrossRef Google Scholar
[2]	Palumbo F, Galvao AC, Nicoletto C, Sambo P, Barcaccia G. 2019. Diversity analysis of sweet potato genetic resources using morphological and qualitative traits and molecular markers. Genes 10:840 doi: 10.3390/genes10110840 CrossRef Google Scholar
[3]	Woolfe JA. 1992. Sweet potato: an untapped food resource. Cambridge, New York: Cambridge University Press. https://doi.org/10.1086/417965
[4]	Kurabachew H. 2015. The role of orange fleshed sweet potato (Ipomea batatas) for combating vitamin A deficiency in Ethiopia: a review. International Journal of Food Science and Nutrition Engineering 5:141−46 Google Scholar
[5]	Yang J, Moeinzadeh MH, Kuhl H, Helmuth J, Xiao P, et al. 2017. Haplotype-resolved sweet potato genome traces back its hexaploidization history. Nature Plants 3:696−703 doi: 10.1038/s41477-017-0002-z CrossRef Google Scholar
[6]	Hoshino A, Jayakumar V, Nitasaka E, Toyoda A, Noguchi H, et al. 2016. Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nature Communications 7:13295 doi: 10.1038/ncomms13295 CrossRef Google Scholar
[7]	Wang D, Liu H, Wang H, Zhang P, Shi C. 2020. A novel sucrose transporter gene IbSUT4 involves in plant growth and response to abiotic stress through the ABF-dependent ABA signaling pathway in Sweetpotato. BMC Plant Biology 20:1−15 doi: 10.1186/s12870-020-02382-8 CrossRef Google Scholar
[8]	Zhang H, Wang Z, Li X, Gao X, Dai Z, et al. 2022. The IbBBX24–IbTOE3–IbPRX17 module enhances abiotic stress tolerance by scavenging reactive oxygen species in sweet potato. New Phytologist 233:1133−52 doi: 10.1111/nph.17860 CrossRef Google Scholar
[9]	Cheng CY, Krishnakumar V, Chan AP, Thibaud-Nissen F, Schobel S, et al. 2017. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. The Plant Journal 89:789−804 doi: 10.1111/tpj.13415 CrossRef Google Scholar
[10]	Ji CY, Bian X, Lee CJ, Kim HS, Kim SE, et al. 2019. De novo transcriptome sequencing and gene expression profiling of sweet potato leaves during low temperature stress and recovery. Gene 700:23−30 doi: 10.1016/j.gene.2019.02.097 CrossRef Google Scholar
[11]	Lee IH, Shim D, Jeong JC, Sung YW, Nam KJ, et al. 2019. Transcriptome analysis of root-knot nematode (Meloidogyne incognita)-resistant and susceptible sweetpotato cultivars. Planta 249:431−44 doi: 10.1007/s00425-018-3001-z CrossRef Google Scholar
[12]	Arisha MH, Aboelnasr H, Ahmad MQ, Liu Y, Tang W, et al. 2020. Transcriptome sequencing and whole genome expression profiling of hexaploid sweetpotato under salt stress. BMC Genomics 21:1−18 doi: 10.1186/s12864-020-6524-1 CrossRef Google Scholar
[13]	Li Y, Wei W, Feng J, Luo H, Pi M, et al. 2018. Genome re-annotation of the wild strawberry Fragaria vesca using extensive Illumina- and SMRT-based RNA-seq datasets. DNA Research 25:61−70 doi: 10.1093/dnares/dsx038 CrossRef Google Scholar
[14]	Dong L, Liu H, Zhang J, Yang S, Kong G, et al. 2015. Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research. BMC Genomics 16:1039 doi: 10.1186/s12864-015-2257-y CrossRef Google Scholar
[15]	Liu T, Li M, Liu Z, Ai X, Li Y. 2021. Reannotation of the cultivated strawberry genome and establishment of a strawberry genome database. Horticulture Research 8:41 doi: 10.1038/s41438-021-00476-4 CrossRef Google Scholar
[16]	Xiong J, Tang X, Wei M, Yu W. 2022. Comparative full-length transcriptome analysis by Oxford Nanopore Technologies reveals genes involved in anthocyanin accumulation in storage roots of sweet potatoes (Ipomoea batatas L.). PeerJ 10:e13688 doi: 10.7717/peerj.13688 CrossRef Google Scholar
[17]	Li Y, Pi M, Gao Q, Liu Z, Kang C. 2019. Updated annotation of the wild strawberry Fragaria vesca V4 genome. Horticulture Research 6:1 doi: 10.1038/s41438-018-0066-6 CrossRef Google Scholar
[18]	Wang F, Tan WF, Song W, Yang ST, Qiao S. 2022. Transcriptome analysis of sweet potato responses to potassium deficiency. BMC Genomics 23:655 doi: 10.1186/s12864-022-08870-5 CrossRef Google Scholar
[19]	Suematsu K, Tanaka M, Kurata R, Kai Y. 2020. Comparative transcriptome analysis implied a ZEP paralog was a key gene involved in carotenoid accumulation in yellow-fleshed sweetpotato. Scientific Reports 10:20607 doi: 10.1038/s41598-020-77293-7 CrossRef Google Scholar
[20]	Tadda SA, Li C, Ding J, Li JA, Wang J, et al. 2023. Integrated metabolome and transcriptome analyses provide insight into the effect of red and blue LEDs on the quality of sweet potato leaves. Frontiers in Plant Science 14:1181680 doi: 10.3389/fpls.2023.1181680 CrossRef Google Scholar
[21]	Tang C, Han R, Zhou Z, Yang Y, Zhu M, et al. 2020. Identification of candidate miRNAs related in storage root development of sweet potato by high throughput sequencing. Journal of Plant Physiology 251:153224 doi: 10.1016/j.jplph.2020.153224 CrossRef Google Scholar
[22]	Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094−100 doi: 10.1093/bioinformatics/bty191 CrossRef Google Scholar
[23]	Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884−i890 doi: 10.1093/bioinformatics/bty560 CrossRef Google Scholar
[24]	Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15−21 doi: 10.1093/bioinformatics/bts635 CrossRef Google Scholar
[25]	Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, et al. 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33:290−+ doi: 10.1038/nbt.3122 CrossRef Google Scholar
[26]	Wu TD, Watanabe CK. 2005. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859−75 doi: 10.1093/bioinformatics/bti310 CrossRef Google Scholar
[27]	Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, et al. 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31:5654−66 doi: 10.1093/nar/gkg770 CrossRef Google Scholar
[28]	Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. 2019. Whole-genome annotation with BRAKER. In Gene Prediction, ed. Kollmar M. New York: Humana. pp. 65−95. https://doi.org/10.1007/978-1-4939-9173-0_5
[29]	Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, et al. 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9:R7 doi: 10.1186/gb-2008-9-1-r7 CrossRef Google Scholar
[30]	Xia R, Meyers BC, Liu Z, Beers EP, Ye S, et al. 2013. MicroRNA superfamilies descended from miR390 and their roles in secondary small interfering RNA biogenesis in eudicots. The Plant Cell 25:1555−72 doi: 10.1105/tpc.113.110957 CrossRef Google Scholar
[31]	Xia R, Xu J, Arikit S, Meyers BC. 2015. Extensive families of miRNAs and PHAS loci in Norway spruce demonstrate the origins of complex phasiRNA networks in seed plants. Molecular Biology and Evolution 32:2905−18 doi: 10.1093/molbev/msv164 CrossRef Google Scholar
[32]	Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754−60 doi: 10.1093/bioinformatics/btp324 CrossRef Google Scholar
[33]	Meyers BC, Green PJ (eds.). 2010. Plant microRNAs: methods and protocols. Totowa, NJ: Humana Press. https://doi.org/10.1007/978-1-4939-9042-9
[34]	Xia R, Zhu H, An YQ, Beers EP, Liu Z. 2012. Apple miRNAs and tasiRNAs with novel regulatory networks. Genome Biology 13:R47 doi: 10.1186/gb-2012-13-6-r47 CrossRef Google Scholar
[35]	Holt C, Yandell M. 2011. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491 doi: 10.1186/1471-2105-12-491 CrossRef Google Scholar
[36]	Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210−12 doi: 10.1093/bioinformatics/btv351 CrossRef Google Scholar
[37]	Jones P, Binns D, Chang HY, Fraser M, Li W, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236−40 doi: 10.1093/bioinformatics/btu031 CrossRef Google Scholar
[38]	Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, et al. 2014. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Research 42:D231−D239 doi: 10.1093/nar/gkt1253 CrossRef Google Scholar
[39]	Zheng Y, Jiao C, Sun H, Rosli HG, Pombo MA, et al. 2016. iTAK: A program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Molecular Plant 9:1667−70 doi: 10.1016/j.molp.2016.09.014 CrossRef Google Scholar
[40]	Cabreira-Cagliari C, Fagundes DGS, Dias NCF, Bohn B, Margis-Pinheiro M, et al. 2018. GILP family: a stress-responsive group of plant proteins containing a LITAF motif. Functional & integrative genomics 18:55−66 doi: 10.1007/s10142-017-0574-8 CrossRef Google Scholar
[41]	Lee SG, Nwumeh R, Jez JM. 2016. Structure and mechanism of isopropylmalate dehydrogenase from Arabdiopsis thaliana: insights on leucine and aliphatic glucosinolate biosynthesis. Journal of Biological Chemistry 291(26):13421−30 Google Scholar
[42]	Murphy AS, Hoogner KR, Peer WA, Taiz L. 2002. Identification, purification, and molecular cloning of N-1-naphthylphthalmic acid-binding plasma membrane-associated aminopeptidases from Arabidopsis. Plant Physiology 128:935−50 doi: 10.1104/pp.010519 CrossRef Google Scholar
[43]	Jin S, Kim SY, Susila H, Nasim Z, Youn G, et al. 2022. FLOWERING LOCUS M isoforms differentially affect the subcellular localization and stability of SHORT VEGETATIVE PHASE to regulate temperature-responsive flowering in Arabidopsis. Molecular Plant 15:1696−709 doi: 10.1016/j.molp.2022.08.007 CrossRef Google Scholar
[44]	Xia R, Ye S, Liu Z, Meyers BC, Liu Z. 2015. Novel and recently evolved microRNA clusters regulate expansive F-BOX gene networks through phased small interfering RNAs in wild diploid strawberry. Plant Physiology 169:594−610 doi: 10.1104/pp.15.00253 CrossRef Google Scholar
[45]	Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, et al. 2020. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular plant 13:1194−202 doi: 10.1016/j.molp.2020.06.009 CrossRef Google Scholar
[46]	Bo X, Wang S. 2005. TargetFinder: a software for antisense oligonucleotide target site selection based on MAST and secondary structures of target mRNA. Bioinformatics 21:1401−2 doi: 10.1093/bioinformatics/bti211 CrossRef Google Scholar

About this article

Cite this article

Liang B, Zhou Y, Liu T, Wang M, Liu Y, et al. 2024. Genome reannotation of the sweetpotato (Ipomoea batatas (L.) Lam.) using extensive Nanopore and Illumina-based RNA-seq datasets. Tropical Plants 3: e008 doi: 10.48130/tp-0024-0009

Liang B, Zhou Y, Liu T, Wang M, Liu Y, et al. 2024. Genome reannotation of the sweetpotato (Ipomoea batatas (L.) Lam.) using extensive Nanopore and Illumina-based RNA-seq datasets. Tropical Plants 3: e008 doi: 10.48130/tp-0024-0009

Figures(5) / Tables(1)

Article Metrics

Article views(8026) PDF downloads(1705)

Other Articles By Authors

on this site
on Google Scholar

ARTICLE Open Access

Genome reannotation of the sweetpotato (Ipomoea batatas (L.) Lam.) using extensive Nanopore and Illumina-based RNA-seq datasets

1.
School of Breeding and Multiplication (Sanya Institute of Breeding and Multiplication), Hainan University, Sanya 572025, China
2.
Key Laboratory for Quality Regulation of Tropical Horticultural Crops of Hainan Province, School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China

Corresponding authors: yplee614@hainanu.edu.cn; zhuguopeng@hainanu.edu.cn

Received: 21 October 2023
Revised: 20 January 2024
Accepted: 23 January 2024
Published online: 21 March 2024

Tropical Plants 3, Article number: e008 (2024) | Cite this article

Abstract: Sweetpotato (Ipomoea batatas (L.) Lam.) is a globally cultivated root crop of paramount significance. The hexaploid genome, known as 'Taizhong 6', has been sequenced and serves as a crucial reference genome for sweetpotato and related species within the Convolvulaceae family. However, the current annotation of the sweetpotato genome relies primarily on ab initio predictions and, to a lesser extent, transcriptome datasets, which only predict coding sequences. Therefore, an improved annotation is highly desirable. Here, we present a comprehensive reannotation of the sweetpotato genome, leveraging 12 Nanopore full-length RNA libraries and 190 Illumina RNA-seq libraries. The improved annotation, named v1.0.a2, includes 42,751 gene models, with 97.4% complete BUSCOs. Within this comprehensive set of gene models, we have modified or added 31,771 gene models and identified 8,736 genes with alternatively spliced isoforms. We have also introduced a new gene ID nomenclature (IbXXGXXXXX) as an improvement over the previous nomenclature (gene.gXXXXX). Additionally, we have annotated and provided expression levels of miRNAs and their targets at different storage roots stages. Overall, our study contributes to an updated genome annotation for the sweetpotato genome, which will significantly facilitate gene functional studies in sweetpotato and promote genomic analyses across the Convolvulaceae family.

Key words:

HTML

Introduction

The sweetpotato (Ipomoea batatas (L.) Lam) is an hexaploid species (2n = 6x = 90) with an estimated genome size of 2.6 G^[1]. Due to its remarkable capacity for high yield and its ability to thrive in diverse environmental conditions, the sweet potato has emerged as a cost-effective provider of essential dietary elements such as calories, protein, fiber, minerals, vitamins, and flavonoids^[2,3], particularly within developing countries. In this context, it is noteworthy that orange-fleshed sweet potatoes have emerged as pivotal players in the ongoing battle against vitamin A deficiency in Africa^[4].

The initial hexaploid sweetpotato variety to be sequenced is cultivar Taizhong 6, which was solely based on the Illumina sequencing platforms^[5]. This effort resulted in the production of 15 pseudochromosomes through the identification of gene synteny between the enhanced haplotype of the I. batatas assembly and the Ipomoea nil genome^[6]. Subsequently, with the advent of third-generation sequencing technology, the Taizhong 6 genome was resequenced using 10X Genomics techniques and Nanopore sequencing (Oxford Nanopore Technologies). The resulting long-read assembly was subsequently anchored onto chromosomes using the linkage map. This assembly effectively integrated homologous sequences into a haploid genome, measuring 473.8 Mb in size, and consisting of 15 sequences/chromosomes with an N50 length of 31 Mb. This high-quality chromosome-scaled genome provides a superior reference for genomic and functional analyses of I. batatas. Using these high-quality sweetpotato genomes, candidate genes for important traits were analyzed^[7,8].

Beyond genome assembly, the availability of accurate and complete genome annotations is crucial to complement genome assembly and enhance genome applicability. Achieving this objective often involves subjecting a single genome to multiple rounds of reannotation. A notable example is the 11^th annotation of the Arabidopsis genome, released in 2017^[9]. The advantage brought forth by Illumina technology has catalyzed the establishment of transcriptome resources for many Ipomoea species, particularly the cultivated relative Ipomoea batata^[10−12]. However, the use of short RNA-sequence reads from Illumina technology presents a significant hurdle in the process of transcript assembly and annotation^[5]. In contrast, long-read sequencing produced via Pacific BioSciences (PacBio) and Oxford Nanopore Technologies (ONT) can provide full-length transcripts, greatly enhancing the precision of gene structure annotation^[13−15]. Moreover, the adoption of full-length sequencing technology also benefits the analysis of alternative splicing, thereby enabling a more comprehensive understanding of gene expression. In the case of polyploids containing large sets of homoeologous genes, exploration of transcript splicing offers the potential to yield supplementary insights into the prevalence of subgenome dominance and the evolutionary origins of novel traits. Recently, 12 high-quality full-length transcriptomes of I. batata were sequenced by ONT sequencing technology^[16]. This resource, derived from ONT RNA sequencing, presents valuable prospects for further improving I. batata genome annotation.

In previous studies, we successfully optimized the genome annotation pipeline to obtain high-quality gene annotations for the genomes of diploid and octoploid strawberries^[13,15,17]. To improve the annotation of the sweetpotato genome, we applied this pipeline with available RNA-seq datasets, which included 12 Nanopore full-length sequencing and 154 RNA-seq libraries. These datasets were generated from various tissues, including storage roots, leaves, and seedling tissues at distinct developmental stages or subjected to different treatments^[16,18−20]. As a result, the newly refined and enhanced annotation, designated v1.0.a2, now encompasses a total of 42,751 protein-coding genes, demonstrating an impressive completeness of 97.4%, as indicated by BUSCOs. Moreover, we identified a total of 132 known and 15 novel miRNAs and predicted their targets, in addition to providing the expression levels of these miRNAs at different storage root stages. Collectively, this updated annotation and the comprehensive gene expression profiles will serve as a valuable data resource for genomics and functional studies in sweetpotato.

Materials and methods

Transcriptome datasets used in this study

In this study, we gathered 12 ONT libraries generated from storage roots of both white-fleshed and purple-fleshed sweetpotato at different developmental stages^[16]. Additionally, we utilized 190 Illumina-based RNA-seq datasets obtained from storage roots, leaves, and seedling tissues at distinct developmental stages or subjected to different treatments^[18−20]. In addition, a total of 15 small RNA-seq libraries generated from different stages of storage root were used for small RNA identiﬁcation (Supplemental Table S1)^[21].

Reads processing
The full-length reads were generated using Pychopper v2 (https://github.com/epi2me-labs/pychopper), which was employed to identifiy, orient and trim full-length Nanopore cDNA reads. Subsequently, these full-length reads were mapped to the I. batata genome of each sample using Minimap2 v2.24^[22]. Initially, mapped reads were then processed using cDNA Cupcake (https://github.com/Magdoll/cDNA_Cupcake) to remove redundancy, considering an alignment identity > 90% and alignment coverage > 85%. Furthermore, 5' degraded reads were excluded to obtain a ﬁnal set of nonredundant reads. For Illumina reads, the first 12 bp of the Illumina RNA-seq reads were removed using the fastp tool^[23]. Subsequently, the clean reads from each library were individually aligned to the I. batata genome^[5] using STAR^[24]. Only the reads mapped uniquely remained for further analysis.

Comprehensive transcriptome generation
The short reads from each library were assembled into transcripts using Stringtie^[25]. To filter out weakly expressed isoforms, a minimum isoform fraction (-f) of 0.2 was applied. The resulting refined Nanopore transcripts were mapped to the I. batata genome using GMAP^[26] with a minimum alignment identity of > 90% and an alignment coverage of > 85%. PASA^[27] was employed to construct the best gene models based on the aligned Nanopore full-length reads. Finally, a comprehensive transcriptome was reconstructed and generated by integrating the genome-guided short-read assembly and Nanopore full-length transcripts.

Gene structure annotation of the I. batata genome
The annotation of the I. batata genome involved the utilization of various evidence sources. Initial gene models were generated using BRAKER2^[28], which integrated trained models from BRAKER with mapped full-length reads, intron hints converted from mapped Nanopore full-length reads, intron hints derived from mapped short Illumina reads, and protein hints converted from mapped UniPro plant proteins and Arabidopsis proteins. Additionally, the I. batata genome underwent soft-repeat masking.

To obtain consensus gene models, EvidenceModeler (EVM)^[29] was used. EVM combined initial BRAKER gene models, mapped Nanopore full-length transcripts, genome-guided transcripts from Illumina RNA-seq, comprehensive transcriptome alignments from PASA, mapped UniProt proteins, and mapped Arabidopsis proteins. The consensus gene models were determined using a nonstochastic weighted value, with the following weight values assigned to each evidence source: 3, 6, 5, 10, 2, and 2, respectively. For further refinement of the gene models, PASA^[27] was used, incorporating the addition of alternatively spliced isoforms, UTR annotations, and modifications to the gene structure. Finally, the new annotations underwent a meticulous one-by-one manual curation, employing IGV-GSAman (v.0.6.83, https://tbtools.cowtransfer.com/s/a11146181df14f). This step was taken to ensure both quality assurance and accuracy.

Functional annotation of gene models
GO terms, KEGG terms, and gene functions were comprehensively annotated through the EggNOG-mapper (v.2.1.9) (http://eggnog-mapper.embl.de). Protein sequences were submited to both the eggNOG-mapper and KOBAS websites, and analysis was conducted using their default settings. Additionally, we employed iTAK (v.1.6, http://itak.feilab.net/cgi-bin/itak/index.cgi) to identify transcription factors and protein kinases.

Identiﬁcation of miRNAs and their target genes
The identiﬁcation of sweetpotato miRNAs followed a previously described workﬂow^[30,31]. Briefly, the reads obtained from the five stages of storage root^[21] were combined and processed. This involved discarding low-quality reads, trimming adapters, and collapsing identical small RNA reads using the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit). The collapsed reads were then aligned to the 'Taizhong 6' genome using Bowtie1^[32], allowing one mismatch. Subsequently, small RNAs with a length of 20–22 nucleotides and ≤ 20 genomic matches were screened for stem-loop structures, considering a maximum of four mispairings and ≤ 1 central bulge. The identified miRNAs were searched against miRbase (www.mirbase.org, v22) using BLAST to identify conserved miRNAs in plants, allowing up to two mismatches. TargetFinder 1.7^[33] was utilized to predict the target genes of the miRNAs within the v1.0.a2 gene set. Target prediction employed alignment scores up to 5, where a lower score indicated a better alignment between the miRNA and its target^[34].

Results and discussion

Results and discussion

Conclusions

In this study, we have significantly enhanced the annotation of the high-quality genome sequence assembly for hexaploid sweetpotato I. batata, resulting in the creation of a new annotation referred to as v1.0.a2. This comprehensive annotation process involved the utilization of 15 Nanopore long-read sequencing datasets obtained from storage roots of both white-fleshed and purple-fleshed sweetpotatoes at various developmental stages. Additionally, we incorporated data from 190 distinct Illumina short-read sequencing datasets. In this v1.0.a2 annotation, a total of 360 newly discovered genes were successfully identified. Furthermore, we have modified or added 31,771 gene models, simultaneously incorporating transcript isoforms and expanding information on 5' and 3' untranslated regions (UTRs) in this updated annotation. Additionally, we conducted an analysis and presented miRNAs, their expression profiles across different storage root stages, and their targets. Overall, this improved annotation, v1.0.a2, represents a valuable resource for genomic analyses within the Convolvulaceae family and serves as an essential reference for gene function studies in cultivated sweetpotatoes. The incorporation of newly discovered genes, refined gene models, and miRNA data enhances our understanding of sweetpotato genomics and facilitates further research in this field.

Author contributions

The authors confirm contribution to the paper as follows: study conception and design: Li Y, Liu T Zhu G; data analysis: Liang B, Zhou Y, Liu T, Wang M, Li Y, Liu Y, Liu YH; draft manuscript preparation and revision: Li Y, Liu T, Zhu G. All authors reviewed the results and approved the final version of the manuscript.

shu — Data availability

Data availability

/

DownLoad: Full-Size Img PowerPoint