ARTICLE   Open Access    

BSA-seq and transcriptome analyses reveal candidate gene associated with petiole color in papaya (Carica papaya L.)

More Information
  • Papaya (Carica papaya L.) is an important tropical species popular for highly nutritious fruit as well as medicinal value. In addition, non-commercial cultivation of papaya trees has resulted in dual-purpose cultivars grown for both fruit and ornamental value in residential areas. Petiole color is a key ornamental trait in papaya that varies amongst cultivars depending on anthocyanin accumulation resulting in purple or green pigmentation. Although inherited as a simple trait, genetic characterization and genomic loci responsible for the purple petiole color in papaya is unknown. In this study, F1 and F2 populations generated from two breeding lines PR-2043 (green petiole) and T5-2562 (purple petiole) were used to evaluate the inheritance patterns of petiole color as well as determine genetic loci and genes involved in petiole pigmentation in papaya through bulk segregant analysis (BSA) and transcriptome sequencing. The segregation of purple petiole color followed a single dominant gene inheritance model (3:1). BSA-seq analysis indicated key genes influencing petiole color are mainly located in chromosome 1 (0.01 to 5.96 Mb) of the papaya genome. Four major genes, including CHS, MYB20, MYB315-like, and MYB75-like within this region exhibited significant differential expression in a comparison between purple and green petiole papaya plants. A relatively high abundance of CHS transcripts was observed in purple petioles and may signify a major involvement in regulating anthocyanins accumulation in papaya petioles. The findings of this study facilitate the future efforts of breeding papaya cultivars with higher economical value in residential landscapes.
  • The Fabaceae family, the third largest family in angiosperms, contains about 24,480 species (WFO, https://wfoplantlist.org), and has been a historically important source of food crops[14]. Peas (formerly Pisum sativa L. renamed to Lathyrus oleraceusPisum spp. will be used hereafter due to historical references to varietal names and subspecies that may not have been fully synonymized), are a member of the Fabaceae family and is among one of the oldest domesticated food crops with ongoing importance in feeding humans and stock. Peas originated in Western Asia and the Mediterranean basin where early finds from Egypt have been dated to ~4500 BCE and further east in Afghanistan from ~2000 BCE[5], and have since been extensively cultivated worldwide[6,7]. Given that peas are rich in protein, dietary fiber, vitamins, and minerals, have become an important part of people's diets globally[810].

    Domesticated peas are the result of long-term human selection and cultivation, and in comparison to wild peas, domesticated peas have undergone significant changes in morphology, growth habits, and yield[1113]. From the long period of domestication starting in and around Mesopotamia many diverse lineages of peas have been cultivated and translocated to other parts of the world[14,15]. The subspecies, Pisum sativum subsp. sativum is the lineage from which most cultivars have been selected and is known for possessing large, round, or oval-shaped seeds[16,17]. In contrast, the subspecies, P. sativum subsp. elatius, is a cultivated pea which more closely resembles wild peas and is mainly found in grasslands and desert areas in Europe, Western Asia, and North Africa. Pisum fulvum is native to the Mediterranean basin and the Balkan Peninsula[15,18], and is resistant to pea rust caused by the fungal pathogen Uromyces pisi. Due to its resistance to pea rust, P. fulvum has been cross-bred with cultivated peas in the development of disease-resistant strains[19]. These examples demonstrate the diverse history of the domesticated pea and why further study of the pea pan-plastome could be employed for crop improvement. While studies based on the nuclear genome have been used to explore the domestication history of pea, these approaches do not account for certain factors, such as maternal inheritance. Maternal lineages, which are inherited through plastomes, play a critical role in understanding the full domestication process. A pan-plastome-based approach will no doubt allow us to investigate the maternal genetic contributions and explore evolutionary patterns that nuclear genome studies may overlook. Besides, pan-plastome analysis enables researchers to systematically compare plastome diversity across wild and cultivated species, identifying specific regions of the plastome that contribute to desirable traits. These plastid traits can then be transferred to cultivated crops through introgression breeding or genetic engineering, leading to varieties with improved resistance to disease, environmental stress, and enhanced agricultural performance.

    Plastids are organelles present in plant cells and are the sites in which several vital biological processes take place, such as photosynthesis in chloroplasts[2024]. Because the origin of plastids is the result of an ancient endosymbiotic event, extant plastids retain a genome (albeit much reduced) from the free-living ancestor[25]. With the advancement of high-throughput DNA sequencing technology, over 13,000 plastid genomes or plastomes have been published in public databases by the autumn of 2023[24]. Large-scale comparison of plastomic data at multiple taxonomic levels has shown that plastomic data can provide valuable insights into evolution, interspecies relationships, and population genetic structure. The plastome, in most cases, displays a conserved quadripartite circular genomic architecture with two inverted repeat (IR) regions and two single copy (SC) regions, referred to as the large single-copy (LSC) and small single-copy (SSC) regions. However, some species have lost one copy of the inverted repeated regions, such as those in Erodium (Geraniaceae family)[26,27] and Medicago (Fabaceae)[28,29]. Compared to previous plastomic studies based on a limited number of plastomes, the construction of pan-plastomes attempts to describe all nucleotide variants present in a lineage through intensive sampling and comparisons. Such datasets can provide detailed insights into the maternal history of a species and help to better understand applied aspects such as domestication history or asymmetries in maternal inheritance, which can help guide future breeding programs. Such pan-plastomes have recently been constructed for several agriculturally important species. A recent study focuses on the genus Gossypium[20], using plastome data at the population level to construct a robust map of plastome variation. It explored plastome diversity and population structure relationships within the genus while uncovering genetic variations and potential molecular marker loci in the plastome. Besides, 65 samples were combined to build the pan-plastome of Hemerocallis citrina[30] , and 322 samples for the Prunus mume pan-plastome[31]. Before these recent efforts, similar pan-plastomes were also completed for Beta vulgaris[32], and Nelumbo nucifera[33]. However, despite the agricultural importance of peas, no such pan-plastome has been completed.

    In this study, 103 complete pea plastomes were assembled and combined another 42 published plastomes to construct the pan-plastome. Using these data, the following analyses were conducted to better understand the evolution and domestication history of pea: (1) genome structural comparisons, (2) codon usage bias, (3) simple sequence repeat patterns, (4) phylogenetic analysis, and (5) nucleotide variation of plastomes in peas.

    One hundred and three complete pea plastomes were de novo assembled from public whole-genome sequencing data[34]. For data quality control, FastQC v0.11.5 (www.bioinformatics.babraham.ac.uk/projects/fastqc/) was utilized to assess the quality of the reads and ensure that the data was suitable for assembly. The clean reads were then mapped to a published pea plastome (MW308610) plastome from the GenBank database (www.ncbi.nlm.nih.gov/genbank) as the reference using BWA v0.7.17[35] and SAMtools v1.9[36] to isolate plastome-specific reads from the resequencing data. Finally, these plastome-specific reads were assembled de novo using SPAdes v3.15.2[37]. The genome annotation was conducted by Geseq online program (https://chlorobox.mpimp-golm.mpg.de/geseq.html). Finally, the OGDRAW v1.3.1[38] program was utilized to visualize the circular plastome maps with default settings. To better resolve the pan-plastome for peas, 42 complete published pea plastomes were also downloaded from NCBI and combined them with the de novo data (Supplementary Table S1).

    To investigate the codon usage in the pan-plastome of pea, we utilized CodonW v.1.4.2 (http://codonw.sourceforge.net) to calculate the Relative Synonymous Codon Usage (RSCU) value of the protein-coding genes (PCGs) longer than 300 bp, excluding stop codons. The RSCU is a calculated metric used to evaluate the relative frequency of usage among synonymous codons encoding the same amino acid. An RSCU value above 1 suggests that the codon is utilized more frequently than the average for a synonymous codon. Conversely, a value below 1 indicates a lower-than-average usage frequency. Besides, the Effective Number of Codons (ENC) and the G + C content at the third position of synonymous codons (GC3s) were also calculated in CodonW v.1.4.2. The ENC value and GC3s value were utilized for generating the ENC-GC3s plot, with the expected ENC values (standard curve), are calculated according to formula: ENC = 2 + GC3s + 29 / [GC3s2 + (1 – GC3s)2][39].

    The MISA program[40] was utilized to detect simple sequence repeats (SSRs), setting the minimum threshold for repeat units at 10 for mono-motifs, 6 for di-motifs, and 5 for tri-, tetra-, penta-, and hexa-motif microsatellites, respectively.

    The 145 complete pea plastomes were aligned using MAFFT v 7.487[41]. Single nucleotide variants (SNVs)-sites were used to derive an SNV only dataset from the entire-plastome alignment[42]. A total of 959 SNVs were analyzed using IQ-TREE v2.1[43] with a TVMe + ASC + R2 substitution model, determined by ModelTest-NG[44] based on BIC, and clade support was assessed with 1,000 bootstrap replicates. Vavilovia formosa (MK604478) was chosen as an outgroup. The principal coordinates analysis (PCA) was conducted in TASSEL 5.0[45].

    DnaSP v6[46] was utilized to identify different haplotypes among the plastomes, with gaps and missing data excluded. Haplotype networks were constructed in Popart v1.7[47] using the median-joining algorithm. Haplotype diversity (Hd) for each group was calculated by DnaSP v6[46], and the evolutionary distances based on the Tamura-Nei distance model were computed based on the population differentiation index (FST) between different groups with the plastomic SNVs.

    In this study, the pan-plastome structure of peas was elucidated (Fig. 1). The length of these plastomes ranged from 120,826 to 122,547 bp. And the overall GC content varied from 34.74% to 34.87%. In contrast to typical plastomes characterized by a tetrad structure, the plastomes of peas contained a single IR copy. The average GC content among all pea plastomes was 34.8%, with the highest amount being 34.84% and the lowest 34.74%, with minimal variation among the pea plastomes.

    Figure 1.  Pea pan-plastome annotation map. Indicated by arrows, genes listed inside and outside the circle are transcribed clockwise and counterclockwise, respectively. Genes are color-coded by their functional classification. The GC content is displayed as black bars in the second inner circle. SNVs, InDels, block substitutions and mixed variants are represented with purple, green, red, and black lines, respectively. Single nucleotide variants (SNVs), block substitutions (BS, two or more consecutive nucleotide variants), nucleotide insertions or deletions (InDels), and mixed sites (which comprise two or more of the preceding three variants at a particular site) are the four categories into which variants are divided.

    A total of 110 unique genes were annotated (Supplementary Table S2), of which 76 genes were PCGs, 30 were transfer RNA (tRNA) genes and four were ribosomal RNA (rRNA) genes. Genes containing a single intron, include nine protein-coding genes (rpl16, rpl2, ndhB, ndhA, petB, petD, rpoC1, clpP, atpF) and six tRNA genes (trnK-UUU, trnV-UAC, trnL-UAA, trnA-UGC, trnI-GAU, trnG-UCC). Additionally, two protein-coding genes ycf3 and rps12 were found to contain two introns.

    The codon usage frequency in pea plastome genes is shown in Fig. 2a. The analysis of codon usage in the pea plastome indicated significant biases for specific codons across various amino acids. Here a nearly average usage in some amino acids was observed, such as Alanine (Ala) and Valine (Val). For most amino acids, the usage of different synonymous codons was not evenly distributed. Regarding stop codons, a nearly even usage was found, with 37.0% for TAA, 32.2% for TAG and 30.8% for TGA.

    Figure 2.  (a) The overall codon usage frequency in 51 CDSs (length > 300 bp) from the pea pan-plastome. (b) The heatmap of RSCU values in 51 CDSs (length > 300 bp) from the pea pan-plastome. The x-axis represents different codons and the y-axis represents different CDSs. The tree at the top was constructed based on a Neighbor-Joining algorithm.

    The RSCU heatmap (Fig. 2b) showed different RSCU values for all codons in plastomic CDSs. In general, a usage bias for A/T in the third position of codons was found among CDSs in the pea pan-plastome. The RSCU values among these CDSs ranged from 0 to 4.8. The highest RSCU value (4.8) was found with the CGT codon in the cemA gene, where six synonymous codons exist for Arg but only CGT (4.8) and AGG (1.2) were used in this gene. This explained in large part the extreme RSCU value for CGT, resulting in an extreme codon usage bias in this amino acid.

    In the ENC-GC3s plot (Fig. 3), 31 PCGs were shown below the standard curve, while 20 PCGs were above. Besides, around 12 PCGs were near the curve, which meant these PCGs were under the average natural selection and mutation pressure. This plot displayed that the codon usage preferences in pea pan-plastomes were mostly influenced by natural selection. Five genes were shown an extreme influence with natural selection for its extreme ΔENC (ENCexpected – ENC) higher than 5, regarding as petB (ΔENC = 5.18), psbA (ΔENC = 8.96), rpl16 (ΔENC = 5.62), rps14 (ΔENC = 14.29), rps18 (ΔENC = 6.46) (Supplementary Table S3).

    Figure 3.  The ENC-GC3s plot for pea pan-plastome, with GC3s as the x-axis and ENC as the y-axis. The expected ENC values (standard curve) are calculated according to formula: ENC = 2 + GC3s + 29 / [GC3s2 + (1 − GC3s)2].

    For SSR detection (Fig. 4), mononucleotide, dinucleotide, and trinucleotide repeats were identified in the pea pan-plastome including A/T, AT/TA, and AAT/ATT. The majority of these SSRs were mononucleotides (A/T), accounting for over 90% of all identified repeats. Additionally, we observed that A/T and AT/TA repeats were present in all pea accessions, whereas only about half of the plastomes contained AAT/ATT repeats. It was also found that the number of A/T repeats exhibited the greatest diversity, while the number of AAT/ATT repeats showed convergence in all plastomes that possessed this repeat.

    Figure 4.  Simple sequence repeats (SSRs) in the pea pan-plastome. The x-axis represents different samples of pea and the y-axis represents the number of repeats in this sample. (a) The number of A/T repeats in the peapan-plastome. (b) The number of AT/TA and AAT/ATT repeats of pea pan-plastomes.

    To better understand the phylogenetic relationships and evolutionary history of peas, a phylogenetic tree was reconstructed using maximum likelihood for 145 pea accessions utilizing the whole plastome sequences (Fig. 5a). The 145 pea accessions were grouped into seven clades with high confidence. These groups were named the 'PF group', 'PSeI-a group', 'PSeI-b group', 'PA group', 'PSeII group', 'PSeIII group', and the 'PS group'. The naming convention for these groups relates to the majority species names for accessions in each group, where P. fulvum makes up the 'PF group', P. sativum subsp. elatius in the 'PSeI-a group', 'PSeI-b group', 'PSeII group', and 'PSeIII group', P. abyssinicum in the 'PA group', and P. sativum in the 'PS group'. From this phylogenetic tree, it was observed that the 'PSeI-a group' and the 'PSeI-b group' had a close phylogenetic relationship and nearly all accessions in these two groups (except DCG0709 accession was P. sativum) were identified as P. sativum subsp. elatius. In addition to the P. sativum subsp. elatius found in PSeI, seven accessions from the PS group were identified as P. sativum subsp. elatius.

    The PCA results (Fig. 5b) also confirmed that domesticated varieties P. abyssinicum were closer to cultivated varieties PSeI and PSeII, while PSeIII was more closely clustered with cultivated varieties of P. sativum. A previous study has indicated that P. sativum subsp. sativum and P. abyssinicum were independently domesticated from different P. sativum subsp. elatius populations[34].

    The complete plastome sequences were utilized for haplotype analysis using TCS and median-joining network methods (Fig. 5c). A total of 76 haplotypes were identified in the analysis. The TCS network resolved a similar pattern as the other analyses in that six genetic clusters were resolved with genetic clusters PS and PSeIII being very closely related. The genetic cluster containing P. fulvum exhibits greater genetic distance from other genetic clusters. The genetic clusters containing P. abyssinicum (PA) and P. sativum (PS) had lower levels of intracluster differentiation. In the TCS network, Hap30 and Hap31 formed distinct clusters from other haplotypes, such as Hap27, which may account for the genetic difference between the 'PSeI-a group' and 'PSeI-b group'. The network analysis results were consistent with the findings of the phylogenetic tree and principal component analyses results in this study.

    Figure 5.  (a) An ML tree resolved from 145 pea plastomes. (b) PCA analysis showing the first two components. (c) Haplotype network of pea plastomes. The size of each circle is proportional to the number of accessions with the same haplotype. (d) Genetic diversity and differentiation of six clades of peas. Pairwise FST between the corresponding genetic clusters is represented by the numbers above the lines joining two bubbles.

    Among the six genetic clusters, the highest haplotype diversity (Hd) was observed in PSeIII (Hd = 0.99, π = 0.22 × 10−3), followed by PSeII (Hd = 0.96, π = 0.43 × 10−3), PSeI (Hd = 0.96, π = 0.94 × 10−3), PF (Hd = 0.94, π = 0.6 × 10−4), PS (Hd = 0.88, π = 0.3 × 10−4), and PA (Hd = 0.70, π = 0.2 × 10−4). Genetic differentiation was evaluated between each genetic cluster by calculating FST values. As shown in Fig. 5d, except for the relatively lower population differentiation between PS and PSeIII (FST = 0.54), and between PSeI and PSeII (FST = 0.59), the FST values between the remaining clades ranged from 0.7 to 0.9. The highest population differentiation was observed between PF and PA (FST = 0.98). The FST values between PSeI and different genetic clusters were relatively low, including PSeI and PF (FST = 0.80), PSeI and PS (FST = 0.77), PSeI and PSeIII (FST = 0.72), PSeI and PSeII (FST = 0.59), and PSeI and PA (FST = 0.72).

    To further determine the nucleotide variations in the pea pan-plastome, 145 plastomes were aligned and nucleotide differences analyzed across the dataset. A total of 1,579 variations were identified from the dataset (Table 1), including 965 SNVs, 24 Block Substitutions, 426 InDels, and 160 mixed variations of these three types. Among the SNVs, transitions were more frequent than transversions, with 710 transitions and 247 transversions. In transitions, T to G and A to C had 148 and 139 occurrences, respectively, while in transversions, G to A and C to T had 91 and 77 occurrences, respectively.

    Table 1.  Nucleotide variation in the pan-plastome of peas.
    Variant Total SNV Substitution InDel Mix
    (InDel, SNV)
    Mix
    (InDel, SUB)
    Total 1,576 965 24 426 156 4
    CDS 734 445 6 176 103 4
    Intron 147 110 8 29 0 0
    tRNA 20 15 1 4 0 0
    rRNA 11 3 0 6 2 0
    IGS 663 392 9 211 51 0
     | Show Table
    DownLoad: CSV

    When analyzing variants by their position to a gene (Fig. 6), there were 731 variations in CDSs, accounting for 46.3% of the total variations, including 443 SNVs (60.6%), six block substitutions (0.83%), 175 InDels (23.94%), and four mixed variations (14.64%). There were 104 variants in introns, accounting for 6.59% of the total variations, including 78 SNVs (75%), seven block substitutions (6.73%), and 19 InDels (18.27%). IGS (Intergenic spacers) contained 660 variations, accounting for 41.8% of the total variations, including 394 SNVs (59.7%), nine block substitutions (1.36%), 207 InDels (31.36%), and 50 mixed variations (7.58%). The tRNA regions contained 63 variants, accounting for 3.99% of the total variations, including 47 SNVs (74.6%) and 14 InDels (22.2%). The highest number of variants were detected in the IGS regions, while the lowest were found in introns. Among CDSs, accD (183) had the highest number of variations. In introns, rpL16 (18) and ndhA (16) had the most variants. In the IGS regions, ndhD-trnI-CAU (73), and trnL-UAA-trnT-UGU (44) possessed the greatest number of variants.

    Figure 6.  Variant locations within the pea pan-plastome categorized by genic position (Introns, CDS, and IGS).

    Finally, examples of some genes with typical variants were provided to better illustrate the sequence differences between clades (Fig. 7). For example, the present analysis revealed that the ycf1 gene exhibited a high number of variant loci, which included unique single nucleotide variants (SNVs) specific to the P. abyssinicum clade. Additionally, a unique InDel variant belonging to P. abyssinicum was identified. Similar unique SNVs and InDels were also found in other genes, such as matK and rpoC2, distinguishing the P. fulvum clade from others. These unique SNVs and InDels could serve as DNA barcodes to distinguish different maternal lineages of peas.

    Figure 7.  Examples of variant sites.

    The present research combined 145 pea plastomes to construct a pan-plastome of peas. Compared to single plastomic studies, pan-plastome analyses across a species or genus provide a higher-resolution understanding of phylogenetic relationships and domestication history. Most plastomes in plants possess a quadripartite circular structure with two inverted repeat (IR) regions and two single copy regions (LSC and SSC)[2024]. However, the complete loss of one of the IR regions in the pea plastome was observed which is well-known among the inverted repeat-lacking clade (IRLC) species in Fabaceae. The loss of IRs has been documented in detail from other genera such as Erodium (Geraniaceae family)[26,27] and Medicago (Fabaceae family)[28,29]. This phenomenon although not commonly observed, constitutes a significant event in the evolutionary trajectories of certain plant lineages[26]. Such large-scale changes in plastome architecture are likely driven in part by a combination of selective pressures and genetic drift[48]. In the pea pan-plastome, it was also found that, compared to some plants with IR regions, the length of the plastomes was much shorter, and the overall GC content was lower. This phenomenon was due to the loss of one IR with high GC content.

    Repetitive sequences are an important part of the evolution of plastomes and can be used to reconstruct genealogical relationships. Mononucleotide SSRs are consistently abundant in plastomes, with many studies identifying them as the most common type of SSR[4952]. Among these, while C/G-type SSRs may dominate in certain species[53,54], A/T types are more frequently observed in land plants. The present research was consistent with these previous conclusions, showing an A/T proportion exceeding 90% (Fig. 4). Due to their high rates of mutation, SSRs are widely used to study phylogenetic relationships and genetic variation[55,56]. Additionally, like other plants, pea plastome genes have a high frequency of A/Ts in the third codon position. This preference is related to the higher AT content common among most plant plastomes and Fabaceae plastomes in particular with their single IRs[57,58]. The AT-rich regions are often associated with easier unwinding of DNA during transcription and potentially more efficient and accurate translation processes[59]. The preference for A/T in third codon positions may also be influenced by tRNA availability, as the abundance of specific tRNAs that recognize these codons can enhance the efficiency of protein synthesis[60,61]. However, not all organisms exhibit this preference for A/T-ending codons. For instance, many bacteria have GC-rich genomes and thus show a preference for G/C-ending codons[6264]. This variation in codon usage bias reflects the differences in genomic composition and the evolutionary pressures unique to different lineages.

    This study also comprehensively examined the variant loci of the pea pan-plastome. Among these variant sites, some could potentially serve as DNA barcode sites for specific lineages of peas, such as ycf1, rpoC2, and matK. Both ycf1 and matK have been widely used as DNA barcodes in many species[6568], as they are hypervariable. Researchers now have a much deeper understanding of the crucial role plastomes have played in plant evolution[6971]. By generating a comprehensive map of variant sites, future researchers can now more effectively trace differences in plastotypes to physiological and metabolic traits for use in breeding elite cultivars.

    The development of a pan-plastome for peas provides new insights into the maternal domestication history of this important food crop. Based on the phylogenetic analysis in this study, we observed a clear differentiation between wild and cultivated peas, with P. fulvum being the earliest diverging lineage, and was consistent with former research[34]. The ML tree (Fig. 5a) indicated that cultivated peas had undergone at least two independent domestications, namely from the PA and PS groups, which is consistent with former research[34]. However, as the present study added several accessions over the previous study and plastomic data was utilized, several differences were also found[34], such as the resolution of the two groups, referred to as PSeI-a group and PSeI-b group which branched between the PA group and PF group. Previous research based on nuclear data[34] only and with fewer accessions showed that the PA group and PF group were closely related in phylogeny, with no PSeI group appearing between them. One possible explanation is that the PSeI-a and PSeI-b lineages represents the capture and retention of a plastome from a now-extinct lineage while backcrossing to modern cultivars has obscured this signal in the nuclear genomic datasets. However, procedural explanations such as incorrectly identified accessions might have also resulted in such patterns. In either case, the presence of these plastomes in the cultivated pea gene pool should be explored for possible associations with traits such as disease resistance and hybrid incompatibility. This finding underscores the complexity of the domestication process and highlights the role of hybridization and selection in shaping the genetic landscape of cultivated peas. As such, future studies integrating data from the nuclear genome, mitogenome, and plastome will undoubtedly provide deeper insights into the phylogeny and domestication of peas. This pan-plastome research, encompassing a variety of cultivated taxa, will also support the development of elite varieties in the future.

    This study newly assembled 103 complete pea plastomes. These plastomes were combined with 42 published pea plastomes to construct the first pan-plastome of peas. The length of pea plastomes ranged from 120,826 to 122,547 bp, with the GC content varying from 34.74% to 34.87%. The codon usage pattern in the pea pan-plastome displayed a strong bias for A/T in the third codon position. Besides, the codon usage of petB, psbA, rpl16, rps14, and rps18 were shown extremely influenced by natural selection. Three types of SSRs were detected in the pea pan-plastome, including A/T, AT/TA, and AAT/ATT. From phylogenetic analysis, seven well-supported clades were resolved from the pea pan-plastome. The genes ycf1, rpoC2, and matK were found to be suitable for DNA barcoding due to their hypervariability. The pea pan-plastome provides a valuable supportive resource in future breeding and selection research considering the central role chloroplasts play in plant metabolism as well as the association of plastotype to important agronomic traits such as disease resistance and interspecific compatibility.

  • The authors confirm contribution to the paper as follows: study conception and design: Wang J; data collection: Kan J; analysis and interpretation of results: Kan J, Wang J; draft manuscript preparation: Kan J, Wang J, Nie L; project organization and supervision: Tiwari R, Wang M, Tembrock L. All authors reviewed the results and approved the final version of the manuscript.

  • The annotation files of newly assembled pea plastomes were uploaded to the Figshare website (https://figshare.com/, doi: 10.6084/m9.figshare.26390824).

  • This study was funded by the Guangdong Pearl River Talent Program (Grant No. 2021QN02N792) and the Shenzhen Fundamental Research Program (Grant No. JCYJ20220818103212025). This work was also funded by the Science Technology and Innovation Commission of Shenzhen Municipality (Grant No. RCYX20200714114538196) and the Innovation Program of Chinese Academy of Agricultural Sciences. We are also particularly grateful for the services of the High-Performance Computing Cluster in the Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences.

  • The authors declare that they have no conflict of interest.

  • Supplementary Table S1 The primers for candidate genes in papaya for qPCR.
    Supplementary Table S2 The expression level of DEGs relating to anthocyanins biosynthesis.
    Supplementary Table S3 The GO annotation of DEGs located in chromosome 1 QTL region.
    Supplementary Table S4 The NCBI blast annotation of CHS, MYB20, MYB75-like and MYB315-like.
    Supplementary Fig. S1 The differential expression heatmap of CHS, MYB20, MYB75-like and MYB315-like between purple and green petioles.
  • [1]

    Saúco VG, Herrero M, Hormaza JI. 2013. Tropical and subtropical fruits. In Horticulture: Plants for People and Places, eds Dixon G, Aldous D. Dordrecht: Springer. Volume 1. pp. 123–57. doi: 10.1007/978-94-017-8578-5_5

    [2]

    Krishna KL, Paridhavi M, Patel JA. 2008. Review on nutritional, medicinal and pharmacological properties of papaya (Carica papaya linn.). Indian Journal of Natural Products and Resources 7:364−73

    Google Scholar

    [3]

    Evans EA, Ballen FH. 2012. Overview of global papaya production, trade, and consumption: FE913/FE913, 9/2012. EDIS 2012(9):1−6

    Google Scholar

    [4]

    Saeed F, Arshad MU, Pasha I, Naz R, Batool R, et al. 2014. Nutritional and phyto-therapeutic potential of papaya (Carica papaya Linn.): an overview. International Journal of Food Properties 17:1637−53

    doi: 10.1080/10942912.2012.709210

    CrossRef   Google Scholar

    [5]

    Hofmeyr JDJ. 1938. Genetical studies of Carica papaya L. South African Journal of Science 35:300−04

    Google Scholar

    [6]

    Oziegbe M, Folorunso AE, Ajao DO. 2015. Inheritance of purple pigmentation in Carica papaya Linn. (caricaceae). International Journal of Plant Research 5:27−33

    Google Scholar

    [7]

    Grotewold E. 2006. The genetics and biochemistry of floral pigments. Annual Review of Plant Biology 57:761−80

    doi: 10.1146/annurev.arplant.57.032905.105248

    CrossRef   Google Scholar

    [8]

    Iwashina T. 2015. Contribution to flower colors of flavonoids including anthocyanins: a review. Natural Product Communications 10:529−44

    doi: 10.1177/1934578X1501000335

    CrossRef   Google Scholar

    [9]

    Fenster CB, Armbruster WS, Wilson P, Dudash MR, Thomson JD. 2004. Pollination syndromes and floral specialization. Annual Review of Ecology, Evolution, and Systematics 12:375−403

    doi: 10.1146/annurev.ecolsys.34.011802.132347

    CrossRef   Google Scholar

    [10]

    Steyn WJ, Wand SJE, Holcroft DM, Jacobs G. 2002. Anthocyanins in vegetative tissues: a proposed unified function in photoprotection. New Phytologist 155:349−61

    doi: 10.1046/j.1469-8137.2002.00482.x

    CrossRef   Google Scholar

    [11]

    He J, Giusti MM. 2010. Anthocyanins: natural colorants with health-promoting properties. Annual Review of Food Science and Technology 1:163−87

    doi: 10.1146/annurev.food.080708.100754

    CrossRef   Google Scholar

    [12]

    Tsuda T. 2012. Dietary anthocyanin-rich plants: biochemical basis and recent progress in health benefits studies. Molecular Nutrition & Food Research 56:159−70

    doi: 10.1002/mnfr.201100526

    CrossRef   Google Scholar

    [13]

    Kui LW, Bolitho K, Grafton K, Kortstee A, Karunairetnam S, et al. 2010. An R2R3 MYB transcription factor associated with regulation of the anthocyanin biosynthetic pathway in Rosaceae. BMC Plant Biology 10:50

    doi: 10.1186/1471-2229-10-50

    CrossRef   Google Scholar

    [14]

    Liu Y, Tikunov Y, Schouten RE, Marcelis LFM, Visser RGF, et al. 2018. Anthocyanin biosynthesis and degradation mechanisms in Solanaceous vegetables: a review. Frontiers in Chemistry 6:52

    doi: 10.3389/fchem.2018.00052

    CrossRef   Google Scholar

    [15]

    Holton TA, Cornish EC. 1995. Genetics and biochemistry of anthocyanin biosynthesis. The Plant Cell 7:1071−83

    doi: 10.2307/3870058

    CrossRef   Google Scholar

    [16]

    Dixon RA, Achnine L, Kota P, Liu CJ, Reddy MSS, et al. 2002. The phenylpropanoid pathway and plant defence—a genomics perspective. Molecular Plant Pathology 3:371−90

    doi: 10.1046/j.1364-3703.2002.00131.x

    CrossRef   Google Scholar

    [17]

    Fraser CM, Chapple C. 2011. The phenylpropanoid pathway in Arabidopsis. The Arabidopsis Book 9:e0152

    doi: 10.1199/tab.0152

    CrossRef   Google Scholar

    [18]

    Saito K, Yonekura-Sakakibara K, Nakabayashi R, Higashi Y, Yamazaki M, et al. 2013. The flavonoid biosynthetic pathway in Arabidopsis: structural and genetic diversity. Plant Physiology and Biochemistry 72:21−34

    doi: 10.1016/j.plaphy.2013.02.001

    CrossRef   Google Scholar

    [19]

    Koes RE, Quattrocchio F, Mol JNM. 1994. The flavonoid biosynthetic pathway in plants: function and evolution. BioEssays 16:123−32

    doi: 10.1002/bies.950160209

    CrossRef   Google Scholar

    [20]

    Xie XB, Li S, Zhang RF, Zhao J, Chen YC, et al. 2012. The bHLH transcription factor MdbHLH3 promotes anthocyanin accumulation and fruit colouration in response to low temperature in apples. Plant, Cell & Environment 35:1884−97

    doi: 10.1111/j.1365-3040.2012.02523.x

    CrossRef   Google Scholar

    [21]

    Gonzalez A, Zhao M, Leavitt JM, Lloyd AM. 2008. Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. The Plant Journal 53:814−27

    doi: 10.1111/j.1365-313X.2007.03373.x

    CrossRef   Google Scholar

    [22]

    Xie Y, Tan H, Ma Z, Huang J. 2016. DELLA proteins promote anthocyanin biosynthesis via sequestering MYBL2 and JAZ suppressors of the MYB/bHLH/WD40 complex in Arabidopsis thaliana. Molecular Plant 9:711−21

    doi: 10.1016/j.molp.2016.01.014

    CrossRef   Google Scholar

    [23]

    Albert NW, Davies KM, Lewis DH, Zhang H, Montefiori M, et al. 2014. A conserved network of transcriptional activators and repressors regulates anthocyanin pigmentation in Eudicots. The Plant Cell 26:962−80

    doi: 10.1105/tpc.113.122069

    CrossRef   Google Scholar

    [24]

    LaFountain AM, Yuan YW. 2021. Repressors of anthocyanin biosynthesis. New Phytologist 231:933−49

    doi: 10.1111/nph.17397

    CrossRef   Google Scholar

    [25]

    Zhou H, Kui LW, Wang F, Espley RV, Ren F, et al. 2019. Activator-type R2R3-MYB genes induce a repressor-type R2R3-MYB gene to balance anthocyanin and proanthocyanidin accumulation. New Phytologist 221:1919−34

    doi: 10.1111/nph.15486

    CrossRef   Google Scholar

    [26]

    Takagi H, Abe A, Yoshida K, Kosugi S, Natsume S, et al. 2013. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. The Plant Journal 74:174−83

    doi: 10.1111/tpj.12105

    CrossRef   Google Scholar

    [27]

    Michelmore RW, Paran I, Kesseli RV. 1991. Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proceedings of the National Academy of Sciences of the United States of America 88:9828−32

    doi: 10.1073/pnas.88.21.9828

    CrossRef   Google Scholar

    [28]

    Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10:57−63

    doi: 10.1038/nrg2484

    CrossRef   Google Scholar

    [29]

    Wang Z, Yan L, Chen Y, Wang X, Huai D, et al. 2022. Detection of a major QTL and development of KASP markers for seed weight by combining QTL-seq, QTL-mapping and RNA-seq in peanut. Theoretical and Applied Genetics 135:1779−95

    doi: 10.1007/s00122-022-04069-0

    CrossRef   Google Scholar

    [30]

    Lu H, Lin T, Klein J, Wang S, Qi J, et al. 2014. QTL-seq identifies an early flowering QTL located near Flowering Locus T in cucumber. Theoretical and Applied Genetics 127:1491−99

    doi: 10.1007/s00122-014-2313-z

    CrossRef   Google Scholar

    [31]

    Illa-Berenguer E, Van Houten J, Huang Z, van der Knaap E. 2015. Rapid and reliable identification of tomato fruit weight and locule number loci by QTL-seq. Theoretical and Applied Genetics 128:1329−42

    doi: 10.1007/s00122-015-2509-x

    CrossRef   Google Scholar

    [32]

    Davis MJ, White TL, Crane JH. 2004. Resistance to Papaya ringspot virus in transgenic papaya breeding lines. Proceedings of the Florida State Horticultural Society 117:241−45

    Google Scholar

    [33]

    Davis MJ, White TL, Crane JH. 2003. Papaya variety development in Florida. Annual Meeting of the Florida State Horticultural Society 116:4−6

    Google Scholar

    [34]

    Porebski S, Bailey LG, Baum BR. 1997. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Molecular Biology Reporter 15:8−15

    doi: 10.1007/BF02772108

    CrossRef   Google Scholar

    [35]

    Andrews S. 2010. FASTQC. A quality control tool for high throughput sequence data. www.bioinformatics.babraham.ac.uk/projects/fastqc/

    [36]

    Bushnell B. 2014. BBMap: a fast, accurate, splice-aware aligner. LBNL Report #: LBNL-7065E. US: Lawrence Berkeley National Laboratory. https://escholarship.org/uc/item/1h3515gn

    [37]

    Yue J, VanBuren R, Liu J, Fang J, Zhang X, et al. 2022. SunUp and Sunset genomes revealed impact of particle bombardment mediated transformation and domestication history in papaya. Nature Genetics 54:715−24

    doi: 10.1038/s41588-022-01068-1

    CrossRef   Google Scholar

    [38]

    Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754−60

    doi: 10.1093/bioinformatics/btp324

    CrossRef   Google Scholar

    [39]

    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078−79

    doi: 10.1093/bioinformatics/btp352

    CrossRef   Google Scholar

    [40]

    McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20:1297−303

    doi: 10.1101/gr.107524.110

    CrossRef   Google Scholar

    [41]

    Mansfeld BN, Grumet R. 2018. QTLseqr: an R package for bulk segregant analysis with next-generation sequencing. The Plant Genome 11:180006

    doi: 10.3835/plantgenome2018.01.0006

    CrossRef   Google Scholar

    [42]

    Liao Y, Smyth GK, Shi W. 2014. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923−30

    doi: 10.1093/bioinformatics/btt656

    CrossRef   Google Scholar

    [43]

    Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37:907−15

    doi: 10.1038/s41587-019-0201-4

    CrossRef   Google Scholar

    [44]

    Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550

    doi: 10.1186/s13059-014-0550-8

    CrossRef   Google Scholar

    [45]

    Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, et al. 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674−76

    doi: 10.1093/bioinformatics/bti610

    CrossRef   Google Scholar

    [46]

    Kolde R, Kolde MR. 2015. Package 'pheatmap'. R package 1:790

    Google Scholar

    [47]

    Livak KJ, Schmittgen TD. 2001. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCᴛ method. Methods 25:402−08

    doi: 10.1006/meth.2001.1262

    CrossRef   Google Scholar

    [48]

    Albert NW, Lewis DH, Zhang H, Schwinn KE, Jameson PE, et al. 2011. Members of an R2R3-MYB transcription factor family in Petunia are developmentally and environmentally regulated to control complex floral and vegetative pigmentation patterning. The Plant Journal 65:771−84

    doi: 10.1111/j.1365-313X.2010.04465.x

    CrossRef   Google Scholar

    [49]

    Schwinn K, Venail J, Shang Y, Mackay S, Alm V, et al. 2006. A small family of MYB-regulatory genes controls floral pigmentation intensity and patterning in the genus Antirrhinum. The Plant Cell 18:831−51

    doi: 10.1105/tpc.105.039255

    CrossRef   Google Scholar

    [50]

    Yamagishi M, Shimoyamada Y, Nakatsuka T, Masuda K. 2010. Two R2R3-MYB genes, homologs of Petunia AN2, regulate anthocyanin biosyntheses in flower tepals, tepal spots and leaves of asiatic hybrid lily. Plant and Cell Physiology 51:463−74

    doi: 10.1093/pcp/pcq011

    CrossRef   Google Scholar

    [51]

    Lachman J, Hamouz K, Šulc M, Orsák M, Pivec V, et al. 2009. Cultivar differences of total anthocyanins and anthocyanidins in red and purple-fleshed potatoes and their relation to antioxidant activity. Food Chemistry 114:836−43

    doi: 10.1016/j.foodchem.2008.10.029

    CrossRef   Google Scholar

    [52]

    Wiczkowski W, Szawara-Nowak D, Topolska J. 2013. Red cabbage anthocyanins: profile, isolation, identification, and antioxidant activity. Food Research International 51:303−09

    doi: 10.1016/j.foodres.2012.12.015

    CrossRef   Google Scholar

    [53]

    Kobayashi S, Goto-Yamamoto N, Hirochika H. 2004. Retrotransposon-induced mutations in grape skin color. Science 304:982

    doi: 10.1126/science.1095011

    CrossRef   Google Scholar

    [54]

    Espley RV, Brendolise C, Chagné D, Kutty-Amma S, Green S, et al. 2009. Multiple repeats of a promoter segment causes transcription factor autoregulation in red apples. The Plant Cell 21:168−83

    doi: 10.1105/tpc.108.059329

    CrossRef   Google Scholar

    [55]

    Jin W, Wang H, Li M, Wang J, Yang Y, et al. 2016. The R2R3 MYB transcription factor PavMYB10.1 involves in anthocyanin biosynthesis and determines fruit skin colour in sweet cherry (Prunus avium L.). Plant Biotechnology Journal 14:2120−33

    doi: 10.1111/pbi.12568

    CrossRef   Google Scholar

    [56]

    Jones CM, Mes P, Myers JR. 2003. Characterization and inheritance of the Anthocyanin fruit (Aft) tomato. Journal of Heredity 94:449−56

    doi: 10.1093/jhered/esg093

    CrossRef   Google Scholar

    [57]

    Butelli E, Garcia-Lor A, Licciardello C, Las Casas G, Hill L, et al. 2017. Changes in anthocyanin production during domestication of Citrus. Plant Physiology 173:2225−42

    doi: 10.1104/pp.16.01701

    CrossRef   Google Scholar

    [58]

    Song H, Yi H, Lee M, Han CT, Lee J, et al. 2018. Purple Brassica oleracea var. capitata F. rubra is due to the loss of BoMYBL2–1 expression. BMC Plant Biology 18:82

    doi: 10.1186/s12870-018-1290-9

    CrossRef   Google Scholar

    [59]

    Zhang XH, Zheng XT, Sun BY, Peng CL, Chow WS. 2018. Over-expression of the CHS gene enhances resistance of Arabidopsis leaves to high light. Environmental and Experimental Botany 154:33−43

    doi: 10.1016/j.envexpbot.2017.12.011

    CrossRef   Google Scholar

    [60]

    Li M, Cao YT, Ye SR, Irshad M, Pan TF, et al. 2017. Isolation of CHS gene from Brunfelsia acuminata flowers and its regulation in anthocyanin biosysthesis. Molecules 22:44

    doi: 10.3390/molecules22010044

    CrossRef   Google Scholar

    [61]

    Ramsay NA, Glover BJ. 2005. MYB–bHLH–WD40 protein complex and the evolution of cellular diversity. Trends in Plant Science 10:63−70

    doi: 10.1016/j.tplants.2004.12.011

    CrossRef   Google Scholar

    [62]

    Niu SS, Xu CJ, Zhang WS, Zhang B, Li X, et al. 2010. Coordinated regulation of anthocyanin biosynthesis in Chinese bayberry (Myrica rubra) fruit by a R2R3 MYB transcription factor. Planta 231:887−99

    doi: 10.1007/s00425-009-1095-z

    CrossRef   Google Scholar

    [63]

    Chiu LW, Zhou X, Burke S, Wu X, Prior RL, et al. 2010. The purple cauliflower arises from activation of a MYB transcription factor. Plant Physiology 154:1470−80

    doi: 10.1104/pp.110.164160

    CrossRef   Google Scholar

  • Cite this article

    Chen S, Michael VN, Brewer S, Chambers A, Wu X. 2025. BSA-seq and transcriptome analyses reveal candidate gene associated with petiole color in papaya (Carica papaya L.). Ornamental Plant Research 5: e002 doi: 10.48130/opr-0024-0032
    Chen S, Michael VN, Brewer S, Chambers A, Wu X. 2025. BSA-seq and transcriptome analyses reveal candidate gene associated with petiole color in papaya (Carica papaya L.). Ornamental Plant Research 5: e002 doi: 10.48130/opr-0024-0032

Figures(3)  /  Tables(1)

Article Metrics

Article views(914) PDF downloads(150)

ARTICLE   Open Access    

BSA-seq and transcriptome analyses reveal candidate gene associated with petiole color in papaya (Carica papaya L.)

Ornamental Plant Research  5 Article number: e002  (2025)  |  Cite this article

Abstract: Papaya (Carica papaya L.) is an important tropical species popular for highly nutritious fruit as well as medicinal value. In addition, non-commercial cultivation of papaya trees has resulted in dual-purpose cultivars grown for both fruit and ornamental value in residential areas. Petiole color is a key ornamental trait in papaya that varies amongst cultivars depending on anthocyanin accumulation resulting in purple or green pigmentation. Although inherited as a simple trait, genetic characterization and genomic loci responsible for the purple petiole color in papaya is unknown. In this study, F1 and F2 populations generated from two breeding lines PR-2043 (green petiole) and T5-2562 (purple petiole) were used to evaluate the inheritance patterns of petiole color as well as determine genetic loci and genes involved in petiole pigmentation in papaya through bulk segregant analysis (BSA) and transcriptome sequencing. The segregation of purple petiole color followed a single dominant gene inheritance model (3:1). BSA-seq analysis indicated key genes influencing petiole color are mainly located in chromosome 1 (0.01 to 5.96 Mb) of the papaya genome. Four major genes, including CHS, MYB20, MYB315-like, and MYB75-like within this region exhibited significant differential expression in a comparison between purple and green petiole papaya plants. A relatively high abundance of CHS transcripts was observed in purple petioles and may signify a major involvement in regulating anthocyanins accumulation in papaya petioles. The findings of this study facilitate the future efforts of breeding papaya cultivars with higher economical value in residential landscapes.

    • Papaya (Carica papaya L., Caricaceae) is a widely cultivated fruit crop in tropical and subtropical regions around the world[1]. Papaya has economic and cultural importance due to its high yield, nutritional value, and medicinal properties[24]. The fruit is consumed when mature as a fresh fruit or when immature as a vegetable; processed products are also produced from it. The worldwide production of papaya in 2022 was estimated to be 13,822,328 metric tons according to the Food and Agriculture Organization Corporate Statistical Database (www.fao.org/faostat/en/#data/QCL). In addition to its value as a food crop, the striking leaf shape and distinctive plant architecture of papaya allow this plant to be used as an ornamental plant in residential landscapes.

      Petiole color is an important ornamental trait in papaya. The common petiole color is green; however, there is a purple color form. The accumulation of purple pigmentation on the petiole, combined with the green lamina gives the papaya plant a unique appearance, reminiscent of a purple and green umbrella. These two distinct petiole phenotypes were first described in 1938, where purple petiole was found to be dominant over non-purple stem color[5]. Subsequent genetic analysis showed that the inheritance of stem color was fairly associated with flower colors, and loosely linked to sex type[5]. Folorunso observed that the petiole color in papaya exhibited an equal segregation ratio of 1:1 (purple : green) among the offspring that resulted from open-pollinated crosses between female and hermaphrodite papaya parents with purple petioles[6]. Interestingly, the study also revealed that the purple petiole color co-segregated with the pigment color of petals, peduncle, fruit rind, and fruit stalk[6], suggesting a genetic linkage or shared regulatory pathway controlling the pigmentation across these tissues.

      In papaya, the accumulation of purple pigment is primarily attributed to a buildup of anthocyanin, which imparts the red to blue hues commonly observed in various plant tissues[7]. Anthocyanins are water-soluble natural pigments that belong to the flavonoid group, and are widely distributed across angiosperms[8]. The presence of anthocyanins is often associated with various biological functions in plants. They play a crucial role in attracting pollinators and seed dispersers, thus affecting plant reproduction rates[9]. Anthocyanins also enhance plant resilience by protecting against a range of biotic and abiotic stresses, including protection against UV light exposure[10]. Additionally, anthocyanins have been acknowledged for their antioxidant/anticarcinogenic properties and health-promoting effects in the prevention of heart disease, cardiovascular disease and cancer[11,12]. Given their importance, the pathways governing anthocyanin biosynthesis, degradation, and regulation have been extensively studied[13,14]. The biosynthesis of anthocyanins is primarily controlled by two gene groups: structural genes and regulatory genes[15]. Structural genes include those involved in the phenylpropanoid pathway, such as phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate:coenzyme A ligase (4CL)[16,17], which are responsible for the initial steps in the biosynthesis of flavonoids, and the genes in the flavonoid biosynthetic pathway, such as chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), 3'-hydroxylase (F3'H), flavonoid 3',5'-hydroxylase (F3'5'H), dihydroflavonol 4-reductase (DFR), leucoanthocyanidin dioxygenase (LDOX), anthocyanidin synthase (ANS), and flavonoid 3-O-glucosyltransferase (UFGT), which are active downstream of anthocyanin biosynthesis[18,19]. Regulatory genes usually influence the pattern and intensity of anthocyanin biosynthesis by controlling the expression of these structural genes. A 'MBW complex' has been widely recognized as a major regulator consisting of R2R3-MYB, basic helix–loop–helix (bHLH), and WD40 proteins[13,2022]. These proteins can act as either activators or repressors in controlling the accumulation of anthocyanins in plants[2325].

      QTL-Seq is a highly efficient approach for rapid identification of genetic loci associated with traits of interest, offering a significant advantage over the more time-consuming and costly conventional QTL analysis methods[26]. This technique integrates bulked-segregant analysis (BSA), an elegant method to rapidly identify the specific genomic region by analyzing two bulked DNA pools consisting of F2 progeny with contrasting phenotypes using next-generation sequencing[26,27]. By comparing two bulked DNA pools representing contrasting phenotypes, the candidate genomic regions or genes are identified via the distribution of single nucleotide polymorphisms (SNPs). In addition to QTL-seq, transcriptome analysis has gained recognition as a reliable strategy for discovering genes associated with specific traits. By examining the expression patterns of genes across different tissues or stages, transcriptome analysis can provide valuable insight into the molecular mechanisms underlying phenotypic variation[28].The combination of QTL-Seq and transcriptome studies has been widely applied to identify genes associated with target traits in different plant species[2931].

      Despite two previous studies on papaya petiole color[5,6], little follow-up work has been done. However, understanding the genetic mechanism that governs petiole color in papaya is not only crucial for unraveling the fundamental biology of this trait but also has significant potential for its practical application in breeding programs. By elucidating the genetic basis of purple pigmentation in petioles, breeders could develop papaya varieties with higher aesthetic and commercial appeal. It also can provide insight into the introduction of purple pigmentation into other tissues and can contribute to the development of novel ornamental or fruit qualities, thereby increasing the economic value of papaya, optimizing plant appeal to consumers and expanding their marketability. In the present study, a joint approach combining BSA-Seq and transcriptome analysis were employed to investigate the genetic basis of petiole color in papaya. By integrating these two methods, the aim is to pinpoint specific genomic regions and the genes responsible for regulating pigmentation of petiole color in papaya. The results from this study will contribute to a deep understanding of how pigmentation is regulated in papaya, and expands the economic value of papaya through breeding new cultivars with both ornamental and fruit traits.

    • Two breeding lines, PR-2043 with green petioles and T5-2562 with purple petioles, were developed by crossing transgenic lines X17-2 with 'Tainung No. 5', and 'Puerto Rico-65' respectively[32,33], and maintained at the Tropical Research and Education Center, University of Florida, Homestead, FL, USA. PR-2043 and T5-2562 were crossed to generate an F1 population, and eight F1 of these plants were transplanted to the field. A hermaphrodite purple petiole F1 papaya plant was selfed to generate the F2 segregating population. The F2 seeds were soaked in water overnight and subsequently immersed in 2.5 mM gibberellic acid for 30 min before sowing in April 2020. These pre-treated seeds were planted in a mixture of 1:1 Promix BX mycorrhiza and perlite. Each 38-cell tray was top-dressed with Osmocote 14-14-14 fertilizer. Seedlings were maintained in the greenhouse and watered as necessary. Phenotyping of the F2 seedlings was carried out in the greenhouse three months after sowing and further confirmed in the field two months later. The petiole color was visually categorized as 'green' or 'purple', and the purple became more visible with plant growth. Chi-square analysis was conducted to assess the segregation ratio of petiole color in the F2 population.

    • The total genomic DNA was extracted from young leaves of individual F2 lines and the two parents (T5-2562 and PR-2590) following a CTAB method[34]. A total of 25 DNA samples representing each petiole color phenotype were pooled together into two DNA bulks for sequencing. Four sequencing libraries were constructed by shearing DNA into short fragments, repairing the ends, and making poly-A-tailed fragments before ligation with Illumina adapters. After size selection, quantified libraries were pooled and sequenced using a 150 bp paired-end program on Illumina HiSeq X10 platform (Novogene, Beijing, China).

      Quality control of the raw sequencing reads was first determined by FastQC[35]. To ensure high-confidence variant calling, the adapters were trimmed using BBDuk[36]. The processed reads were then used to create the consensus sequences of both T5-2562 and PR-2590 by aligning to the 'SunUp' reference genome[37]. Read alignment of both F2 bulked pools were assessed by BWA software[38], and SAMtools[39]. Picard tools were used to mark duplicate and index bam files of F2 bulked pools and each parent's consensus sequences. GenomeAnalysisToolkit (GATK) was used to perform variant calling[40]. SNPs and indels were filtered by GATK VariantFiltration function with parameters QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0. Low-quality SNPs were removed from the final output, which were subsequently used for QTL analysis with the 'QTLseqr' R package[41]. The confidence intervals were determined using 10,000 simulations of the QTL-seq method as described previously[26]. The 95% (p < 0.05) confidence interval was set to consider that the genomic loci showing statistical significance[41].

    • The epidermal and cortex layers were collected from the papaya petiole of mature (18 months-old) PR-2043 and T5-2562 plants for the transcriptome study (< 1 mm thick). Green petioles were collected from PR-2043 and petioles in the process of turning from green to purple were collected from T5-2562. The freshly harvested tissues were flash-frozen in liquid nitrogen and then ground into fine powder for RNA extraction, two technical replications were processed for each sample. A total of 100 mg of tissue was processed with 1 mL TRIzol reagent, followed by washing with 70% ethanol and resuspension in 50 μL of DEPC-treated water. RNA-free Dnase (Qiagen, Hilden, Germany) and Rneasy PowerClean Pro Cleanup Kit (Qiagen, Hilden, Germany) were applied for further purification. The NovaSeq6000 platform was used to perform the sequencing.

      Quality control and adapter removal of the raw sequence data were processed as described above. Clean reads were then mapped to the papaya reference genome using HISAT2 (--dta) before counting mapped transcripts with featureCount software following default parameters[42,43]. Genes that were differently expressed between the petioles of PR-2043 and T5-2562 were identified and quantified using DESeq2 with normalization[44]. Transcripts with a |log2(fold change)| > 2 were considered as differentially expressed genes and annotated by Blast2Go[45]. The candidate genes were identified from DEGs according to their function and false discovery rate (FDR) correction (> 0.05). The expression of candidate genes was visualized in a heatmap plotted by R Package 'Pheatmap'[46].

    • To verify candidate gene expression in green and purple petioles, PR-2240 and T5-2562 were used respectively. PR-2240 is a green-petiole line genetically associated with PR-2043. The purple and green epidermal and cortex layers of petioles from mature (18 months old) T5-2562 and PR-2240 plants were collected (< 1 mm thick) freshly and frozen using liquid nitrogen, respectively. Then, high-quality RNA was extracted from collected epidermal and cortex layers by using E.Z.N.A. Plant RNA Kit (Omega Bio-tek, GA, USA). The quantity of the RNA was determined by Qubit4 (Thermo Fisher, MA, USA). The RNA samples were aliquoted to uniform concentration (744 ng/μL) and reverse transcribed into cDNA using amfiRivert Sensi cDNA Master Mix (GenDEPOT, TX, USA). The CDS nucleotide sequences of each candidate gene and Primer3 were used to develop the primer pairs for CHS, and MYB20 (Supplementary Table S1). Three biological and technical replicates were processed for each candidate gene using the housekeeping gene actin as a control. The qPCR reaction was performed in QuantStudio3 (Applied Biosystems, CA, US) in a 10 μL reaction containing 5 μL 2X PowerUp™ SYBR™ Green Master Mix (Applied Biosystems), 0.4 μM of forward and reverse primer, 3.2 μL Nuclease Free water, and 1 μL cDNA. The mixture was initially held at 50 °C for 2 min and 95 °C for 2 min, incubated at 95 °C for 15 s, followed by 40 cycles at 55 °C for 15 s, and 72 °C for 1 min. The melt curve was set at 95 °C for 15 s, 60 °C for 1 min, and 95 °C for 1 s. The 2−ΔΔCᴛ method was used to analyze the relative changes in gene expression[47].

    • To investigate the inheritance of petiole color, an F2 population was developed. By crossing PR-2043 (green petiole) × T5-2562 (purple petiole, Fig. 1a), eight F1 plants were generated and all with purple petiole. A single fruit from one of the F1 plants was used to produce the F2 seedlings used in this study. In this study, the petiole color was evaluated three months after the seed germination and then the stability of the petiole color was confirmed two months later. Of the total 280 F2 seedlings, 223 were observed to have purple petioles, and 57 had green petioles (Fig. 1b), and the purple pigmentation was observed to become more noticeable as the plant grew. The purple-to-green color segregated at a 3:1 ratio in the F2 population. The Chi-square statistic and p-value were 3.219 and 0.0728 respectively (Fig. 1c), indicating that purple petioles in papaya follow a single dominant gene inheritance model.

      Figure 1. 

      Phenotypes of papaya petioles. (a) Petiole color of PR-2043 and T5-2562 parents. (b) Purple and green petioles of papaya F2 population. (c) Segregation of petiole color in the F2 population.

    • BSA-Seq analysis was used to examine the nucleotide diversity between F2 progenies with purple petioles (F2P) and green petioles (F2G) to characterize the genomic regions responsible for papaya purple petiole color. A total of 21.4 Gb (61.02 × depth) and 29.2 Gb (83.36 × depth) sequence reads (150 bp pair end) were generated for F2P and F2G bulks using whole genome sequence (Table 1). A total of 22.4 Gb (63.87 × depth) and 29.2 Gb (83.23 × depth) raw sequence reads was generated for T5-2562 and PR-2590, respectively. Consensus genomes of each parent were constructed by using papaya 'SunUp' genome as a reference. Subsequently, SNPs calling was carried out by comparing two F2 bulks and three genomes. The short reads of F2P and F2G bulks were aligned to the two parental consensus genomes and to the 'SunUp' genome, which yielded three sets of allelic segregation with 927,513, 518,567, and 1,423,583 SNPs, respectively. SNPs with low mapping rate (< 40%) were removed from the dataset, which yielded a total of 443,996 SNPs from purple parent, 235,895 SNPs from a green parent, and 687,084 SNPs from the reference genome for QTL mapping. At a 95% confidence interval, two QTL regions (189,558−1,368,545 bp and 2,739,922−3,777,906 bp) were identified on chromosome 1 of the reference genome, two QTLs were identified on chromosome 1 of the PR-2590 consensus sequence, spanning 621,177−1,791,321 bp and 3,799,705−5,554,073 bp and one QTL was identified on chromosome 1 of the T5-2562 consensus sequence (13,715−5,961,552 bp (Fig. 2). The QTL regions consistently overlapped across the same region in chromosome 1 of all three genomes with peak QTL SNPs supported by 99% confidence levels. Genome annotation identified a total of 653 genes located in the overlapping QTL region (13,715−5,961,552 bp).

      Table 1.  Sequencing information of parental lines and two bulks.

      Sample Raw reads Raw data Sequencing
      depth
      Effective (%) GC (%)
      T5-2562 149366802 22.4 63.87 99.12 37.36
      PR-2590 194642908 29.2 83.23 98.40 37.26
      F2P 142694068 21.4 61.02 98.03 37.05
      F2G 194944976 29.2 83.36 98.29 36.89

      Figure 2. 

      QTL regions associated with papaya petiole color in three genomes, (a) SunUp, (b) PR-2590, and (c) T5-2562.

    • A total of 2,145 differentially expressed genes (DEGs) (|log2fold change| > 2) were identified through the transcriptome profiling of PR-2043 and from T5-2562. The GO analysis found that most DEGs were involved in various molecular functions, including small molecular binding, and organic cyclic compound binding transferase activity. Nine DEGs were involved in flavonoid biosynthetic pathways including CHI, DFR, CHS, UFGT, and flavanol synthase. Thirty-five and 17 DEGs were identified as putative MYB and bHLH transcription factors, respectively (Supplementary Table S2).

    • The BSA-seq and transcriptome analysis identified a total of 67 genes within the QTL region on chromosome 1 that were differentially expressed between the green and purple petiole color papayas. Of them, the functional annotation identified 32 genes that acted on several biological processes, including the regulation of DNA-templated transcription, fruit ripening, methylation, and glutamine metabolism. Eleven of these play a role in molecular function, such as methyltransferase activity and nucleic acid binding. The remaining genes have a function in cellular components, including membrane, plasma membrane, and plasmodesma (Supplementary Table S3). Notably, four genes including chalcone synthase CHS, MYB315-like, MYB20, and MYB75-like, were associated with anthocyanin biosynthesis and regulation (Fig. 3a, Supplementary Fig. S1, Supplementary Table S4). CHS was highly expressed in purple petioles as compared to green petioles, suggesting CHS might play a key role in anthocyanin accumulation of papaya petioles.

      Figure 3. 

      Candidate genes associated with anthocyanin accumulation in papaya petiole. (a) The statistical information of candidate genes expression in different material. (b) The expression level validation of CHS and MYB20 in purple and green papaya petiole by q-PCR.

      The RNA was extracted from the epidermic layer of the petiole of T5-2562 and PR-2240 to determine the expression level of CHS and MYB20 using qPCR. The qPCR results showed that the expression of CHS and MYB20 in purple petioles were both more highly expressed than that of the green petioles (Fig. 3b). The qPCR expression pattern of CHS was consistent with the RNA-seq results, whereas MYB20 showed a contradictory pattern (Fig. 3a & b; Supplementary Fig. S1). Segregation analysis, transcriptome data, and qPCR validation suggest that the MYB20 may be involved in other biological functions during petiole development, but it is not associated with petiole color in papaya.

    • Anthocyanins, water-soluble pigments generated by the phenylpropanoid pathway, contribute many pink, purple, and blue hues in plants. Anthocyanins are not only natural dyes with brilliant colors but also edible consumption that benefit heart, eye, metabolic, and cognitive health in humans[12]. The accumulation of anthocyanins contributes to pigment diversity in distinct species pigment variation. It is very common in floral tissues[4850], and vegetative tissues[51,52]. While this within-species pigment variation is rare in displaying contrasting fruit color, like grapes[53], apples[54], and cherries[55].

      Papaya is an economically and culturally important crop in the tropical areas of the world. Ornamental traits such as petiole color, leaf shape, and growth habit are value-added traits in papaya for homeowners and landscapers. In papaya, anthocyanin accumulation only appears in a few phenotypes, specifically in the epidermis of the petiole, stem, fruit stem, and leaf vein. Additionally, the purple pigmentation in the petiole was observed to become more pronounced as the papaya plant grows[6]. The present genetic study revealed that the purple phenotype is dominant over the green in papaya and follows a single dominant inheritance pattern, which is consistent with the previous hypothesis of anthocyanin accumulation in papaya[5,6]. In other species including tomatoes[56], sweet cherries[55], and blood oranges[57], a single dominant gene has also been implicated as controlling contrasting anthocyanin phenotypes. In other cases, species such as purple cabbage, however, anthocyanin accumulation is regulated by a transcription repressor[58]. Although the evidence strongly supports purple as a dominant trait in papaya petioles, the prevalence of green petiole papayas in nature remains an enigma that demands more investigation. There is evidence indicating that the inheritance of purple pigmentation in papaya stem is loosely linked to sex type[5]. Therefore, one hypothesis is that the gene governing anthocyanin accumulation in papaya was subject to human selection based on sex types during cultivation.

      The anthocyanins biosynthetic pathway is downstream of the flavonoid pathway and includes structural genes such as CHS, CHI, F3H, F3'H, F3'5'H, DFR, LDOX, ANS, and UFGT[18,19]. CHI, DFR, CHS, and UFGT were found to be expressed differentially between purple and green petioles. Among these genes, CHS was the only differentially expressed gene that was also located in the QTL region identified by QTL-seq analysis. The flavonoid pathway begins when CHS mediates the synthesis of naringenin chalcone[14,15]. Several reports have indicated a positive correlation between CHS gene expression and anthocyanin content[59,60]. RNA-Seq and qPCR both verified the expression level of CHS in purple petiole is higher than that of green petiole in PR-2043 and PR-2240 compared to T5-2562, strongly suggesting that anthocyanin accumulation in papaya petiole is influenced by elevated CHS expression. MYB transcription factors have also been identified as a crucial group regulating anthocyanin biosynthesis either by acting independently on other structural genes or combining into MBW complexes with bHLH and WD40 proteins to regulate late pathway genes[61]. Contrasting anthocyanin accumulation phenotypes are often caused by mutations within the coding sequence of MYB factors, as in Chinese bayberry[62], or in the promoter region, e.g. in cauliflower[63]. In this study, a total of 35 and 17 MYB and bHLH transcripts, respectively, were detected as DEGs from the RNA-Seq analysis. Three of them lie within the QTL region identified through QTL-seq analysis. However, inconclusive expression patterns were observed in different papaya cultivars with green petioles through qPCR and RNA-Seq, suggesting further research is required to characterize the role of MYB20 in anthocyanin accumulation in papaya petioles.

      Anthocyanin-rich foods, such as eggplant and blueberry are popular in the market. High-anthocyanin varieties have been developed to meet the demand for diverse and nutrient-rich produce, like blood orange, red cabbage, etc. It has been reported that the purple color pigmentation in papaya has pleiotropic effects, which is also noticed in the fruit rind, fruit stalk, and peduncle[6]. Elucidation of the genetics governing purple pigmentation in this study will not only give insight into developing the different phenotypes of papaya to explore its ornamental value but also facilitate future efforts to breed the anthocyanin-rich papaya fruits. Furthermore, the genetic mechanism behind anthocyanin accumulation in vegetative tissues can have future applications. For example, anthocyanin accumulation genes can be transformed into plants that are driven by a papaya fruit-specific promoter, to potentially develop the anthocyanin-rich fruits. CRISPR technology can also be applied to papaya for seedling selection with sex types by using anthocyanin accumulation gene as an indicator, which would greatly benefit commercial papaya growers.

      • This work was supported in part by the U.S. Department of Agriculture Hatch project FLA-TRC-006217.

      • The authors confirm contribution to the paper as follows: study concept and design: Chambers A, Wu X; population development and phenotyping: Brewer S, Chen S; data analysis: Chen S, Brewer S, Michael VN; manuscript preparation: Chen S, Brewer S, Michael VN, Chambers A, Wu X; manuscript revision: Chen S, Michael VN, Wu X. All authors reviewed the results and approved the final version of the manuscript.

      • The data generated is available in the Gene Expression Omnibus (GEO), NCBI, via accession number GSE269737.

      • The authors declare that they have no conflict of interest.

      • Copyright: © 2025 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (3)  Table (1) References (63)
  • About this article
    Cite this article
    Chen S, Michael VN, Brewer S, Chambers A, Wu X. 2025. BSA-seq and transcriptome analyses reveal candidate gene associated with petiole color in papaya (Carica papaya L.). Ornamental Plant Research 5: e002 doi: 10.48130/opr-0024-0032
    Chen S, Michael VN, Brewer S, Chambers A, Wu X. 2025. BSA-seq and transcriptome analyses reveal candidate gene associated with petiole color in papaya (Carica papaya L.). Ornamental Plant Research 5: e002 doi: 10.48130/opr-0024-0032

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
  • Table 1.  Nucleotide variation in the pan-plastome of peas.
    Variant Total SNV Substitution InDel Mix
    (InDel, SNV)
    Mix
    (InDel, SUB)
    Total 1,576 965 24 426 156 4
    CDS 734 445 6 176 103 4
    Intron 147 110 8 29 0 0
    tRNA 20 15 1 4 0 0
    rRNA 11 3 0 6 2 0
    IGS 663 392 9 211 51 0
     | Show Table
    DownLoad: CSV