-
In this study, the pan-plastome structure of peas was elucidated (Fig. 1). The length of these plastomes ranged from 120,826 to 122,547 bp. And the overall GC content varied from 34.74% to 34.87%. In contrast to typical plastomes characterized by a tetrad structure, the plastomes of peas contained a single IR copy. The average GC content among all pea plastomes was 34.8%, with the highest amount being 34.84% and the lowest 34.74%, with minimal variation among the pea plastomes.
Figure 1.
Pea pan-plastome annotation map. Indicated by arrows, genes listed inside and outside the circle are transcribed clockwise and counterclockwise, respectively. Genes are color-coded by their functional classification. The GC content is displayed as black bars in the second inner circle. SNVs, InDels, block substitutions and mixed variants are represented with purple, green, red, and black lines, respectively. Single nucleotide variants (SNVs), block substitutions (BS, two or more consecutive nucleotide variants), nucleotide insertions or deletions (InDels), and mixed sites (which comprise two or more of the preceding three variants at a particular site) are the four categories into which variants are divided.
A total of 110 unique genes were annotated (Supplementary Table S2), of which 76 genes were PCGs, 30 were transfer RNA (tRNA) genes and four were ribosomal RNA (rRNA) genes. Genes containing a single intron, include nine protein-coding genes (rpl16, rpl2, ndhB, ndhA, petB, petD, rpoC1, clpP, atpF) and six tRNA genes (trnK-UUU, trnV-UAC, trnL-UAA, trnA-UGC, trnI-GAU, trnG-UCC). Additionally, two protein-coding genes ycf3 and rps12 were found to contain two introns.
Codon usage and simple sequence repeats (SSRs) patterns in peas
-
The codon usage frequency in pea plastome genes is shown in Fig. 2a. The analysis of codon usage in the pea plastome indicated significant biases for specific codons across various amino acids. Here a nearly average usage in some amino acids was observed, such as Alanine (Ala) and Valine (Val). For most amino acids, the usage of different synonymous codons was not evenly distributed. Regarding stop codons, a nearly even usage was found, with 37.0% for TAA, 32.2% for TAG and 30.8% for TGA.
Figure 2.
(a) The overall codon usage frequency in 51 CDSs (length > 300 bp) from the pea pan-plastome. (b) The heatmap of RSCU values in 51 CDSs (length > 300 bp) from the pea pan-plastome. The x-axis represents different codons and the y-axis represents different CDSs. The tree at the top was constructed based on a Neighbor-Joining algorithm.
The RSCU heatmap (Fig. 2b) showed different RSCU values for all codons in plastomic CDSs. In general, a usage bias for A/T in the third position of codons was found among CDSs in the pea pan-plastome. The RSCU values among these CDSs ranged from 0 to 4.8. The highest RSCU value (4.8) was found with the CGT codon in the cemA gene, where six synonymous codons exist for Arg but only CGT (4.8) and AGG (1.2) were used in this gene. This explained in large part the extreme RSCU value for CGT, resulting in an extreme codon usage bias in this amino acid.
In the ENC-GC3s plot (Fig. 3), 31 PCGs were shown below the standard curve, while 20 PCGs were above. Besides, around 12 PCGs were near the curve, which meant these PCGs were under the average natural selection and mutation pressure. This plot displayed that the codon usage preferences in pea pan-plastomes were mostly influenced by natural selection. Five genes were shown an extreme influence with natural selection for its extreme ΔENC (ENCexpected – ENC) higher than 5, regarding as petB (ΔENC = 5.18), psbA (ΔENC = 8.96), rpl16 (ΔENC = 5.62), rps14 (ΔENC = 14.29), rps18 (ΔENC = 6.46) (Supplementary Table S3).
Figure 3.
The ENC-GC3s plot for pea pan-plastome, with GC3s as the x-axis and ENC as the y-axis. The expected ENC values (standard curve) are calculated according to formula: ENC = 2 + GC3s + 29 / [GC3s2 + (1 − GC3s)2].
For SSR detection (Fig. 4), mononucleotide, dinucleotide, and trinucleotide repeats were identified in the pea pan-plastome including A/T, AT/TA, and AAT/ATT. The majority of these SSRs were mononucleotides (A/T), accounting for over 90% of all identified repeats. Additionally, we observed that A/T and AT/TA repeats were present in all pea accessions, whereas only about half of the plastomes contained AAT/ATT repeats. It was also found that the number of A/T repeats exhibited the greatest diversity, while the number of AAT/ATT repeats showed convergence in all plastomes that possessed this repeat.
Figure 4.
Simple sequence repeats (SSRs) in the pea pan-plastome. The x-axis represents different samples of pea and the y-axis represents the number of repeats in this sample. (a) The number of A/T repeats in the peapan-plastome. (b) The number of AT/TA and AAT/ATT repeats of pea pan-plastomes.
Phylogenetic analysis
-
To better understand the phylogenetic relationships and evolutionary history of peas, a phylogenetic tree was reconstructed using maximum likelihood for 145 pea accessions utilizing the whole plastome sequences (Fig. 5a). The 145 pea accessions were grouped into seven clades with high confidence. These groups were named the 'PF group', 'PSeI-a group', 'PSeI-b group', 'PA group', 'PSeII group', 'PSeIII group', and the 'PS group'. The naming convention for these groups relates to the majority species names for accessions in each group, where P. fulvum makes up the 'PF group', P. sativum subsp. elatius in the 'PSeI-a group', 'PSeI-b group', 'PSeII group', and 'PSeIII group', P. abyssinicum in the 'PA group', and P. sativum in the 'PS group'. From this phylogenetic tree, it was observed that the 'PSeI-a group' and the 'PSeI-b group' had a close phylogenetic relationship and nearly all accessions in these two groups (except DCG0709 accession was P. sativum) were identified as P. sativum subsp. elatius. In addition to the P. sativum subsp. elatius found in PSeI, seven accessions from the PS group were identified as P. sativum subsp. elatius.
The PCA results (Fig. 5b) also confirmed that domesticated varieties P. abyssinicum were closer to cultivated varieties PSeI and PSeII, while PSeIII was more closely clustered with cultivated varieties of P. sativum. A previous study has indicated that P. sativum subsp. sativum and P. abyssinicum were independently domesticated from different P. sativum subsp. elatius populations[34].
The complete plastome sequences were utilized for haplotype analysis using TCS and median-joining network methods (Fig. 5c). A total of 76 haplotypes were identified in the analysis. The TCS network resolved a similar pattern as the other analyses in that six genetic clusters were resolved with genetic clusters PS and PSeIII being very closely related. The genetic cluster containing P. fulvum exhibits greater genetic distance from other genetic clusters. The genetic clusters containing P. abyssinicum (PA) and P. sativum (PS) had lower levels of intracluster differentiation. In the TCS network, Hap30 and Hap31 formed distinct clusters from other haplotypes, such as Hap27, which may account for the genetic difference between the 'PSeI-a group' and 'PSeI-b group'. The network analysis results were consistent with the findings of the phylogenetic tree and principal component analyses results in this study.
Figure 5.
(a) An ML tree resolved from 145 pea plastomes. (b) PCA analysis showing the first two components. (c) Haplotype network of pea plastomes. The size of each circle is proportional to the number of accessions with the same haplotype. (d) Genetic diversity and differentiation of six clades of peas. Pairwise FST between the corresponding genetic clusters is represented by the numbers above the lines joining two bubbles.
Among the six genetic clusters, the highest haplotype diversity (Hd) was observed in PSeIII (Hd = 0.99, π = 0.22 × 10−3), followed by PSeII (Hd = 0.96, π = 0.43 × 10−3), PSeI (Hd = 0.96, π = 0.94 × 10−3), PF (Hd = 0.94, π = 0.6 × 10−4), PS (Hd = 0.88, π = 0.3 × 10−4), and PA (Hd = 0.70, π = 0.2 × 10−4). Genetic differentiation was evaluated between each genetic cluster by calculating FST values. As shown in Fig. 5d, except for the relatively lower population differentiation between PS and PSeIII (FST = 0.54), and between PSeI and PSeII (FST = 0.59), the FST values between the remaining clades ranged from 0.7 to 0.9. The highest population differentiation was observed between PF and PA (FST = 0.98). The FST values between PSeI and different genetic clusters were relatively low, including PSeI and PF (FST = 0.80), PSeI and PS (FST = 0.77), PSeI and PSeIII (FST = 0.72), PSeI and PSeII (FST = 0.59), and PSeI and PA (FST = 0.72).
Nucleotide variation in the pea pan-plastome
-
To further determine the nucleotide variations in the pea pan-plastome, 145 plastomes were aligned and nucleotide differences analyzed across the dataset. A total of 1,579 variations were identified from the dataset (Table 1), including 965 SNVs, 24 Block Substitutions, 426 InDels, and 160 mixed variations of these three types. Among the SNVs, transitions were more frequent than transversions, with 710 transitions and 247 transversions. In transitions, T to G and A to C had 148 and 139 occurrences, respectively, while in transversions, G to A and C to T had 91 and 77 occurrences, respectively.
Table 1. Nucleotide variation in the pan-plastome of peas.
Variant Total SNV Substitution InDel Mix
(InDel, SNV)Mix
(InDel, SUB)Total 1,576 965 24 426 156 4 CDS 734 445 6 176 103 4 Intron 147 110 8 29 0 0 tRNA 20 15 1 4 0 0 rRNA 11 3 0 6 2 0 IGS 663 392 9 211 51 0 When analyzing variants by their position to a gene (Fig. 6), there were 731 variations in CDSs, accounting for 46.3% of the total variations, including 443 SNVs (60.6%), six block substitutions (0.83%), 175 InDels (23.94%), and four mixed variations (14.64%). There were 104 variants in introns, accounting for 6.59% of the total variations, including 78 SNVs (75%), seven block substitutions (6.73%), and 19 InDels (18.27%). IGS (Intergenic spacers) contained 660 variations, accounting for 41.8% of the total variations, including 394 SNVs (59.7%), nine block substitutions (1.36%), 207 InDels (31.36%), and 50 mixed variations (7.58%). The tRNA regions contained 63 variants, accounting for 3.99% of the total variations, including 47 SNVs (74.6%) and 14 InDels (22.2%). The highest number of variants were detected in the IGS regions, while the lowest were found in introns. Among CDSs, accD (183) had the highest number of variations. In introns, rpL16 (18) and ndhA (16) had the most variants. In the IGS regions, ndhD-trnI-CAU (73), and trnL-UAA-trnT-UGU (44) possessed the greatest number of variants.
Figure 6.
Variant locations within the pea pan-plastome categorized by genic position (Introns, CDS, and IGS).
Finally, examples of some genes with typical variants were provided to better illustrate the sequence differences between clades (Fig. 7). For example, the present analysis revealed that the ycf1 gene exhibited a high number of variant loci, which included unique single nucleotide variants (SNVs) specific to the P. abyssinicum clade. Additionally, a unique InDel variant belonging to P. abyssinicum was identified. Similar unique SNVs and InDels were also found in other genes, such as matK and rpoC2, distinguishing the P. fulvum clade from others. These unique SNVs and InDels could serve as DNA barcodes to distinguish different maternal lineages of peas.
-
This study newly assembled 103 complete pea plastomes. These plastomes were combined with 42 published pea plastomes to construct the first pan-plastome of peas. The length of pea plastomes ranged from 120,826 to 122,547 bp, with the GC content varying from 34.74% to 34.87%. The codon usage pattern in the pea pan-plastome displayed a strong bias for A/T in the third codon position. Besides, the codon usage of petB, psbA, rpl16, rps14, and rps18 were shown extremely influenced by natural selection. Three types of SSRs were detected in the pea pan-plastome, including A/T, AT/TA, and AAT/ATT. From phylogenetic analysis, seven well-supported clades were resolved from the pea pan-plastome. The genes ycf1, rpoC2, and matK were found to be suitable for DNA barcoding due to their hypervariability. The pea pan-plastome provides a valuable supportive resource in future breeding and selection research considering the central role chloroplasts play in plant metabolism as well as the association of plastotype to important agronomic traits such as disease resistance and interspecific compatibility.
-
About this article
Cite this article
Kan J, Nie L, Wang M, Tiwari R, Tembrock LR, et al. 2024. The Mendelian pea pan-plastome: insights into genomic structure, evolutionary history, and genetic diversity of an essential food crop. Genomics Communications 1: e004 doi: 10.48130/gcomm-0024-0004
The Mendelian pea pan-plastome: insights into genomic structure, evolutionary history, and genetic diversity of an essential food crop
- Received: 12 August 2024
- Revised: 29 October 2024
- Accepted: 29 October 2024
- Published online: 27 November 2024
Abstract: The Mendelian pea (Pisum sativa), a member of the Fabaceae family, is widely cultivated worldwide as an important food resource. While extensive genetic studies have been conducted on pea, a comprehensive pan-plastome assembly has not yet been achieved. The present study combined 103 newly assembled pea plastomes with 42 previously published plastomes to construct the first pea pan-plastome. The lengths of plastomes varied from 120,826 to 122,547 bp, with an average GC content of 34.8%. Protein-coding genes in the pan-plastome exhibited a strong bias towards A/T in the third codon position, with a notably high frequency of the amino acid arginine (RSCU value = 4.8) among plastome-encoded proteins. Additionally, the codon usage of petB, psbA, rpl16, rps14, and rps18 showed extreme influence from natural selection. Moreover, the genes ycf1, rpoC2, and matK were identified as hypervariable regions, suggesting their potential utility as DNA barcoding loci to distinguish maternal lineages for breeding and other agronomic purpose. The phylogenetic results indicated that cultivated peas had undergone at least two independent domestications, originating from the PA and PS groups. Compared to former research based on nuclear data, the PSeI-a group and PSeI-b group were newly found branched between the PA group and PF group.
-
Key words:
- Pisum sativa /
- Pan-plastome /
- Population structure /
- DNA barcoding