-
The mitogenomes of Morus species showed a complex assembly graph mediated by different numbers of long repeats (LRs) or node branches based on long-reads data (Fig. 1). One pair of LRs existed in each of the mitogenome of M. notabilis and M. alba 'Zhongsang5801' (M. alba-ZS5801; Fig. 1a, c), three pairs of LRs occurred in M. alba 'Zhenzhubai' (M. alba-ZZB; Fig. 1d), and LRs-mediated multiple conformations were not detected in M. mongolica (Fig. 1b). These repeats were resolved by artificially simulating four possible paths and making judgments based on the mapping results of long reads, and the dominant conformation of the mitogenome for Morus species was obtained by using this strategy. For M. notabilis, the repeat sequence (LR1) with 9,951 bp length mediated genomic recombination and formed one master circular structure or two small circular structures with equal probability (Fig. 2a), and here the master circle representing the complete mitogenome was used for subsequent analysis. In the mitogenome of M. mongolica, the obtained complex structure was disassembled into a linear molecule and a circular molecule after calculating the depth at the node (Fig. 2b). The two cultivars of M. alba showed two completely different mitogenomic conformations, the genomic recombination mediated by one pair of LRs (LR1) resulted in the mitogenome disassembled into a linear molecule and a circular molecule in the cultivar 'Zhongsang5801' (Fig. 2c), and three pairs of LRs (LR1, LR2, and LR3) occurred in the mitogenome of the cultivar 'Zhenzhubai' which divided the complex structure into three circular molecules and one linear molecule (Fig. 2d). Finally, the paired-end reads and long reads were mapped to the mitogenomes, and the statistics of the sequencing depth showed that a gap-free genome assembly of high quality was obtained (Supplementary Fig. S1).
Figure 1.
Mitogenome assembly graph and possible connections (black lines) mediated by repeats for (a) M. notabilis, (b) M. mongolica, (c) M. alba-ZS5801, and (d) M. alba-ZZB. Each colored segment is labeled with its name and coverage. The boxes represent the different mitogenome conformations mediated by each pair of repeats and the probability of being supported.
Figure 2.
The mitogenome map of Morus species. (a) M. notabilis, (b) M. mongolica, (c) M. alba-ZS5801, and (d) M. alba-ZZB. Genes belonging to different functional groups are color-coded.
The complete mitogenomes of the four Morus accessions ranged from 359,062 bp (M. alba-ZZB) to 376,846 bp (M. mongolica), revealing a large interspecific variation among different Morus species and cultivars (Fig. 2). The GC content for the four mitogenomes varied slightly from 45.4% (M. alba-ZS5801) to 45.7% (M. notabilis). The four Morus mitogenomes encoded a different set of genes with the number ranging from 49 to 55, including 31−32 PCGs, 15−20 tRNA genes and three rRNA genes (Supplemental Fig. S2, Table 1). Among these genes, 30 PCGs, 14 tRNA genes, and all three rRNA genes (rrn5, rrn18, and rrn26) were shared by all four mitogenomes. Seven PCGs including ccmFC, cox2, nad1, nad2, nad4, nad5, nad7, and two tRNAs (trnA-UGC, trnI-GAU) contained more than one intron (Table 1). For PCGs, rps3 and rps13 were lost in M. alba-ZZB and M. alba-ZS5801 respectively, three PCGs including atp9, nad3, and rps12 had two copies in M. alba-ZZB and two genes cob and matR had also two copies in M. alba-ZS5801. For tRNA genes, three, five, and four tRNA genes were missing in M. mongolica, M. alba-ZZB, and M. alba-ZS5801 respectively, only trnQ-UUG had two copies in M. alba-ZS5801 and two tRNA genes (trnP-UGG, trnW-CCA) had one more copy in M. notabilis compared with the other three genomes.
Table 1. Summary of genes contained in the four Morus mitogenomes.
Group of genes M. notabilis M. mongolica M. alba-ZS5801 M. alba-ZZB Core genes ATP synthase atp1, atp4, atp6, atp8, atp9 atp1, atp4, atp6, atp8, atp9 atp1, atp4, atp6, atp8, atp9 atp1, atp4, atp6, atp8, atp9 (2) Cytochrome c biogenesis ccmB, ccmC, ccmFC*, ccmFN ccmB, ccmC, ccmFC*, ccmFN ccmB, ccmC, ccmFC*, ccmFN ccmB, ccmC, ccmFC*, ccmFN Ubichinol cytochrome c reductase cob cob cob (2) cob Cytochrome c oxidase cox1, cox2*, cox3 cox1, cox2*, cox3 cox1, cox2*, cox3 cox1, cox2*, cox3 Maturases matR matR matR (2) matR Transport membrane protein mttB mttB mttB mttB NADH dehydrogenase nad1*, nad2****, nad3, nad4***, nad4L, nad5**, nad6, nad7****, nad9 nad1*, nad2***, nad3, nad4***, nad4L, nad5**, nad6, nad7****, nad9 nad1*, nad2***, nad3, nad4***, nad4L, nad5**, nad6, nad7****, nad9 nad1*, nad2***, nad3 (2), nad4***, nad4L, nad5**, nad6, nad7****, nad9 Variable genes Large subunit of ribosome rpl16 rpl16 rpl16 rpl16 Small subunit of ribosome rps3*, rps4, rps7, rps12, rps13, rps19 rps3, rps4, rps7, rps12, rps13, rps19 rps3, rps4, rps7, rps12, rps19 rps4, rps7, rps12 (2), rps13, rps19 Succinate dehydrogenase sdh4 sdh4 sdh4 sdh4 rRNA genes Ribosomal RNAs rrn5, rrn18, rrn26 rrn5, rrn18, rrn26 rrn5, rrn18, rrn26 rrn5, rrn18, rrn26 tRNA genes Transfer RNAs trnA-UGC*, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GCC, trnI-CAU, trnI-GAU*, trnK-UUU, trnL-CAA, trnM-CAU, trnN-GUU, trnP-UGG (3), trnQ-UUG, trnR-ACG, trnS-UGA, trnV-GAC, trnW-CCA (2), trnY-GUA trnA-UGC*, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnI-CAU, trnK-UUU, trnL-CAA, trnM-CAU, trnN-GUU, trnP-UGG (2), trnQ-UUG, trnR-ACG, trnS-UGA, trnW-CCA, trnY-GUA trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnI-CAU, trnK-UUU, trnL-CAA, trnM-CAU, trnN-GUU, trnP-UGG (2), trnQ-UUG (2), trnR-ACG, trnS-UGA, trnW-CCA, trnY-GUA trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GCC, trnI-CAU, trnK-UUU, trnM-CAU, trnN-GUU, trnP-UGG (2), trnQ-UUG, trnS-UGA, trnW-CCA, trnY-GUA The number of asterisks indicate the number of introns that the genes contained. The number in the parentheses indicate the number of copies of the genes. Repeat sequences analysis
-
The dispersed repeats within each Morus mitogenome were identified as two major types, including 105 (M. notabilis)−141 (M. mongolica) forward and 103 (M. notabilis)−125 (M. alba-ZZB) palindromic repeats and one complementary repeat was found in three mitogenomes except M. notabilis (Fig. 3a). No reverse repeat sequences were observed in any of the four mitogenomes. These repeats ranged from 30 to 22,006 bp (eight were longer than 1 kb) (Supplementary Table S3), with most distributed in non-coding regions and some found in PCGs such as ccmFC, nad2, nad4, nad5, nad7, cob, cox2, matR, and atp9. Additionally, 13−16 tandem repeats with repeat lengths ranging from 12−39 bp were detected in the four genomes and most of these tandem repeats had a length of 15−25 bp (Fig. 3a; Supplementary Table S4). Moreover, 91 (M. notabilis)-109 (M. alba-ZS5801) microsatellites were detected across the four Morus mitogenomes, comprising of 34−48 mono-, 10−15 di-, 11−16 tri-, 29−31 tetra-, 1−2 penta-, and 1−2 hexanucleotide repeat motifs (Fig. 3b).
Figure 3.
Repeat sequences in the Morus mitogenomes. (a) Type and number of dispersed repeats and tandem repeats, (b) type and number of simple sequence repeats (SSRs).
Codon bias analysis of mitogenome
-
Codon usage was analyzed in the 32 PCGs of the M. notabilis and M. mongolica mitogenomes, 33 PCGs of M. alba-ZS5801, and 34 PCGs of M. alba-ZZB. A total of 9,619, 9,629, 10,552, and 9,617 codons were identified and a high similarity in codon usage and amino acid frequencies was observed in all four mitogenomes (Fig. 4; Supplementary Table S5). Most of the PCGs used AUG as the start codon, while GUG and AUA were found to be the start codon for rpl16 and nad1 respectively in all four accessions. Three stop codons including UAA, UAG, and UGA were identified in the PCGs, but UAA with RSCU > 1 was the most preferred codon. For the codons of encoding amino acids, there were 31 (M. notabilis), 30 (M. mongolica), 33 (M. alba-ZS5801), and 31 (M. alba-ZZB) codons with RSCU > 1, indicating a higher frequency of these codons compared to other synonymous codons. Among them, the GCU codon for Ala had the highest RSCU value (1.58−1.64), as well as the CAG codon for Gln had the smallest one with an RSCU value ranging from 0.44 to 0.47.
Figure 4.
Relative synonymous codon usage (RSCU) in the four Morus mitogenomes. (a) M. notabilis, (b) M. mongolica, (c) M. alba-ZS5801, and (d) M. alba-ZZB respectively.
Plastid DNA transfer to mitogenome (MTPTs)
-
The plastomes of the four Morus accessions assembled in this study varied slightly in length, ranging from 159,136 bp (M. mongolica) to 159,546 bp (M. notabilis; Fig. 5). To eliminate redundant detections, only a single IR region of each plastome was used for analysis, and 38, 39, 35, and 40 mitochondrial-plastid DNA transfers (MTPTs) were identified in the mitogenomes of M. notabilis, M. mongolica, M. alba-ZS5801, and M. alba-ZZB, respectively (Fig. 5; Supplementary Table S6). These MTPTs had a combined length of 26,024 bp (M. notabilis), 21,398 bp (M. mongolica), 17,698 bp (M. alba-ZS5801), and 10,112 bp (M. alba-ZZB) accounting for 7.19%, 5.68%, 4.86%, and 2.82% of the mitogenomes. Among these MTPTs, the majority of transferred fragments for each accession were shorter than 1000 bp and there was a total of 14 MTPTs longer than 1,000 bp in all the mitogenomes, both the longest (16,238 bp) and shortest (30 bp) MTPTs were found in M. notabilis. These MTPTs were then extracted and annotated. Five intact tRNA genes were discovered (trnP-UGG, trnW-CCA, trnD-GUC, trnM-CAU, and trnN-GUU) and were identified as being transferred from plastomes to the mitogenomes for all accessions. Two intact tRNA genes including trnL-CAA and trnR-ACG transferred to three of the four mitogenomes except M. alba-ZZB, trnI-GAU, and trnV-GAC were only transferred in M. notabilis as well as trnA-UGC was transferred in M. notabilis and M. mongolica. In addition, trnF-GAA and trnQ-UUG might have experienced rapid sequence divergences during the migration process due to partial gene fragments identified in the four mitogenomes. For PCGs, only some fragments of rps12 were transferred to the four mitogenomes.
Figure 5.
Schematic representation of the distribution of MTPTs between plastome and mitogenomes for (a) M. notabilis, (b) M. mongolica, (c) M. alba-ZS5801, and (d) M. alba-ZZB. The MTPTs on the plastid IR regions were counted only once. Different colors of ribbons represent different identities. The length of each MTPT and the genes it contains can be found in Supplemental Table S6.
Phylogenetic analyses
-
For the phylogenetic inference, 25 individual gene alignments of PCGs were concatenated to generate a combined matrix comprising 42 accessions with a length of 24,627 bp. The best-fit nucleotide substitution model determined for the concatenated matrix was GTR + I + G. The tree topologies reconstructed by ML and BI analyses were completely consistent (Fig. 6). In the phylogenetic trees, most nodes were strongly supported with maximal ML bootstrap (BS) or Bayesian posterior probability (PP), and all the species were grouped into six families with different species of the same genus forming monophyletic clades. In the Rosales, the six families were divided into two clades with maximal support (BS/PP = 100/1). Two subfamilies of Rosaceae, Amygdaloideae, and Rosoideae, comprised one clade. In the other clade (BS/PP = 100/1), the family Rhamnaceae was the first diverging lineage, followed by a clade comprising two Hippophae species from the family Elaeagnaceae. For the remaining three families, Moraceae and Cannabaceae first formed a monophyletic group (BS/PP = 100/1) which was subsequently sister to the family Ulmaceae. Within the family Moraceae, ten species from five genera grouped into two clades. Four individuals of three Morus species formed one of the clades (BS/PP = 100/1), M. notabilis was firstly diverged and two individuals of M. alba were not resolved as monophyletic. Ficus carica occupied the basal position of the other clade with moderate support (BS/PP = −/0.76), followed by a maximal supported clade (BS/PP = 100/1) comprising of Malaisia scandens and Allaeanthus kurzii sister then to the genus Broussonetia. For the three Broussonetia species, B. papyrifera was well-supported to be sister to the B. kaempferi and B. monoica clade.
-
In the present study, four high-quality mitogenomes of M. notabilis, M. mongolica, M. alba 'Zhongsang5801' and M. alba 'Zhenzhubai' were assembled based on a hybrid strategy, and various aspects explored including genome structure, gene transfer, and phylogenetic implications. The mitogenomes were structurally heterogeneous across four Morus accessions and multiple genome conformations existed simultaneously for each species. The gene content was relatively conserved especially for PCGs, but the repeat sequences and foreign gene transfer sites varied widely in the four mitogenomes. The phylogeny of Rosales was investigated based on mitogenome sequences, the phylogenetic position of Rhamnaceae showed a strong difference compared to earlier plastid and nuclear phylogenies, which provided us with new insights into the evolution of Rosales.
-
About this article
Cite this article
Liu L, Long Q, Lv W, Qian J, Egan AN, et al. 2024. Long repeat sequences mediated multiple mitogenome conformations of mulberries (Morus spp.), an important economic plant in China. Genomics Communications 1: e005 doi: 10.48130/gcomm-0024-0005
Long repeat sequences mediated multiple mitogenome conformations of mulberries (Morus spp.), an important economic plant in China
- Received: 24 October 2024
- Revised: 15 November 2024
- Accepted: 20 November 2024
- Published online: 28 November 2024
Abstract: Mulberries (genus Morus; Moraceae) hold significant economic value in sericulture and have great potential in the horticultural industry, food industry, and human health arenas worldwide. Since the advent of the genomic era, biological macromolecules of Morus species such as whole genome and plastome sequences have been reported, but mitochondrial genome sequences are relatively scarce which greatly hinders the comprehensive understanding of the evolutionary history and processes at work with Morus. Here, four Morus mitogenomes were assembled using Illumina and PacBio HiFi data. The results elucidated that the structure of the four mitogenomes was greatly heterogeneous due to the presence of different numbers of repeat-mediated recombination events with multiple conformations existing simultaneously in the mitogenome for each species. The genome size ranged from 359,062 to 376,846 bp. The repeat sequences and gene transfers from plastome to mitogenome varied widely among the four mitogenomes, which was the main cause of variation in mitogenome size. Finally, the evolutionary history of Rosales was inferred based on the mitogenome sequences. The analyses revealed a strong difference in the phylogenetic placement of Rhamnaceae compared to earlier plastid or nuclear phylogenies, likely due to the effects of ancient hybridizations. Overall, the results presented here will provide important genetic resources for the utilization of this important economic plant.
-
Key words:
- Mulberry /
- Mitogenome /
- Repeat-mediated recombination /
- Phylogeny /
- Rosales