ARTICLE   Open Access    

Pecan kinome: classification and expression analysis of all protein kinases in Carya illinoinensis

More Information
  • Protein kinases (PKs) are involved in plant growth and stress responses, and constitute one of the largest superfamilies due to numerous gene duplications. However, limited PKs have been functionally described in pecan, an economically important nut tree. Here, the comprehensive identification, annotation and classification of the entire pecan kinome are reported. A total of 967 PK genes were identified from the pecan genome, and further classified into 20 different groups and 121 subfamilies using the kinase domain sequences, which were verified by phylogenetic analysis. The receptor-like kinase (RLK) group contained 565 members, which constituted the largest group. Gene duplication contributed to the expansion of pecan kinome, 169 segmental duplication events including 285 PK genes were found, and the Ka/Ks ratio revealed they experienced strong negative selection. The RNA-Seq data of PK genes in pecan were further analyzed at the subfamily level, and different PK subfamilies performed various expression patterns across pecan embryo development or drought treatment, suggesting PK genes in pecan are involved in embryo development and drought stress response. Taken together, this study provides insight into the classification, expansion, evolution, and expression of pecan PKs. Our findings regarding expansion, expression and co-expression analyses lay a good foundation for future research to understand the roles of pecan PKs, and more efficiently determine the key candidate genes.
  • Aquaporin’s (AQPs) are small (21–34 kD) channel-forming, water-transporting trans-membrane proteins which are known as membrane intrinsic proteins (MIPs) conspicuously present across all kingdoms of life. In addition to transporting water, plant AQPs act to transport other small molecules including ammonia, carbon dioxide, glycerol, formamide, hydrogen peroxide, nitric acid, and some metalloids such as boron and silicon from the soil to different parts of the plant[1]. AQPs are typically composed of six or fewer transmembrane helices (TMHs) coupled by five loops (A to E) and cytosolic N- and C-termini, which are highly conserved across taxa[2]. Asparagine-Proline-Alanine (NPA) boxes and makeup helices found in loops B (cytosolic) and E (non-cytosolic) fold back into the protein's core to form one of the pore's two primary constrictions, the NPA region[1]. A second filter zone exists at the pore's non-cytosolic end, where it is called the aromatic/arginine (ar/R) constriction. The substrate selectivity of AQPs is controlled by the amino acid residues of the NPA and ar/R filters as well as other elements of the channel[1].

    To date, the AQP gene families have been extensively explored in the model as well as crop plants[39]. In seed plants, AQP distributed into five subfamilies based on subcellular localization and sequence similarities: the plasma membrane intrinsic proteins (PIPs; subgroups PIP1 and PIP2), the tonoplast intrinsic proteins (TIPs; TIP1-TIP5), the nodulin26-like intrinsic proteins (NIPs; NIP1-NIP5), the small basic intrinsic proteins (SIPs; SIP1-SIP2) and the uncategorized intrinsic proteins (XIPs; XIP1-XIP3)[2,10]. Among them, TIPs and PIPs are the most abundant and play a central role in facilitating water transport. SIPs are mostly found in the endoplasmic reticulum (ER)[11], whereas NIPs homologous to GmNod26 are localized in the peribacteroid membrane[12].

    Several studies reported that the activity of AQPs is regulated by various developmental and environmental factors, through which water fluxes are controlled[13]. AQPs are found in all organs such as leaves, roots, stems, flowers, fruits, and seeds[14,15]. According to earlier studies, increased AQP expression in transgenic plants can improve the plants' tolerance to stresses[16,17]. Increased root water flow caused by upregulation of root aquaporin expression may prevent transpiration[18,19]. Overexpression of Tamarix hispida ThPIP2:5 improved osmotic stress tolerance in Arabidopsis and Tamarix plants[20]. Transgenic tomatoes having apple MdPIP1;3 ectopically expressed produced larger fruit and improved drought tolerance[21]. Plants over-expressing heterologous AQPs, on the other hand, showed negative effects on stress tolerance in many cases. Overexpression of GsTIP2;1 from G. soja in Arabidopsis plants exhibited lower resistance against salt and drought stress[22].

    A few recent studies have started to establish a link between AQPs and nanobiology, a research field that has been accelerating in the past decade due to the recognition that many nano-substances including carbon-based materials are valuable in a wide range of agricultural, industrial, and biomedical activities[23]. Carbon nanotubes (CNTs) were found to improve water absorption and retention and thus enhance seed germination in tomatoes[24,25]. Ali et al.[26] reported that Carbon nanoparticles (CTNs) and osmotic stress utilize separate processes for AQP gating. Despite lacking solid evidence, it is assumed that CNTs regulate the aquaporin (AQPs) in the seed coats[26]. Another highly noticed carbon-nano-molecule, the fullerenes, is a group of allotropic forms of carbon consisting of pure carbon atoms[27]. Fullerenes and their derivatives, in particular the water-soluble fullerols [C60(OH)20], are known to be powerful antioxidants, whose biological activity has been reduced to the accumulation of superoxide and hydroxyl[28,29]. Fullerene/fullerols at low concentrations were reported to enhance seed germination, photosynthesis, root growth, fruit yield, and salt tolerance in various plants such as bitter melon and barley[3032]. In contrast, some studies also reported the phytotoxic effect of fullerene/fullerols[33,34]. It remains unknown if exogenous fullerene/fullerol has any impact on the expression or activity of AQPs in the cell.

    Garden pea (P. sativum) is a cool-season crop grown worldwide; depending on the location, planting may occur from winter until early summer. Drought stress in garden pea mainly affects the flowering and pod filling which harm their yield. In the current study, we performed a genome-wide identification and characterization of the AQP genes in garden pea (P. sativum), the fourth largest legume crop worldwide with a large complex genome (~4.5 Gb) that was recently decoded[35]. In particular, we disclose, for the first time to our best knowledge, that the transcriptional regulations of AQPs by osmotic stress in imbibing pea seeds were altered by fullerol supplement, which provides novel insight into the interaction between plant AQPs, osmotic stress, and the carbon nano-substances.

    The whole-genome sequence of garden pea ('Caméor') was retrieved from the URGI Database (https://urgi.versailles.inra.fr/Species/Pisum). Protein sequences of AQPs from two model crops (Rice and Arabidopsis) and five other legumes (Soybean, Chickpea, Common bean, Medicago, and Peanut) were used to identify homologous AQPs from the garden pea genome (Supplemental Table S1). These protein sequences, built as a local database, were then BLASTp searched against the pea genome with an E-value cutoff of 10−5 and hit a score cutoff of 100 to identify AQP orthologs. The putative AQP sequences of pea were additionally validated to confirm the nature of MIP (Supplemental Table S2) and transmembrane helical domains through TMHMM (www.cbs.dtu.dk/services/TMHMM/).

    Further phylogenetic analysis was performed to categorize the AQPs into subfamilies. The pea AQP amino acid sequences, along with those from Medicago, a cool-season model legume phylogenetically close to pea, were aligned through ClustalW2 software (www.ebi.ac.uk/Tools/msa/clustalw2) to assign protein names. The unaligned AQP sequences to Medicago counterparts were once again aligned with the AQP sequences of Arabidopsis, rice, and soybean. Based on the LG model, unrooted phylogenetic trees were generated via MEGA7 and the neighbor-joining method[36], and the specific name of each AQP gene was assigned based on its position in the phylogenetic tree.

    By using the conserved domain database (CDD, www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml), the NPA motifs were identified from the pea AQP protein sequences[37]. The software TMHMM (www.cbs. dtu.dk/services/TMHMM/)[38] was used to identify the protein transmembrane domains. To determine whether there were any alterations or total deletion, the transmembrane domains were carefully examined.

    Basic molecular properties including amino acid composition, relative molecular weight (MW), and instability index were investigated through the online tool ProtParam (https://web.expasy.org/protparam/). The isoelectric points (pI) were estimated by sequence Manipulation Suite version 2 (www.bioinformatics.org/sms2)[39]. The subcellular localization of AQP proteins was predicted using Plant-mPLoc[40] and WoLF PSORT (www.genscript.com/wolf-psort.html)[ 41] algorithms.

    The gene structure (intron-exon organization) of AQPs was examined through GSDS ver 2.0[42]. The chromosomal distribution of the AQP genes was illustrated by the software MapInspect (http://mapinspect.software.informer.com) in the form of a physical map.

    To explore the tissue expression patterns of pea AQP genes, existing NGS data from 18 different libraries covering a wide range of tissue, developmental stage, and growth condition of the variety ‘Caméor’ were downloaded from GenBank (www.ncbi.nlm.nih.gov/bioproject/267198). The expression levels of the AQP genes in each tissue and growth stage/condition were represented by the FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) values. Heatmaps of AQPs gene were generated through Morpheus software (https://software.broadinstitute.org/morpheus/#).

    Different solutions, which were water (W), 0.3 M mannitol (M), and fullerol of different concentrations dissolved in 0.3 M mannitol (MF), were used in this study. MF solutions with the fullerol concentration of 10, 50, 100, and 500 mg/L were denoted as MF1, MF2, MF3, and MF4, respectively. Seeds of 'SQ-1', a Chinese landrace accession of a pea, were germinated in two layers of filter paper with 30 mL of each solution in Petri dishes (12 cm in diameter) each solution, and the visual phenotype and radicle lengths of 150 seeds for each treatment were analyzed 72 h after soaking. The radicle lengths were measured using a ruler. Multiple comparisons for each treatment were performed using the SSR-Test method with the software SPSS 20.0 (IBM SPSS Statistics, Armonk, NY, USA).

    Total RNA was extracted from imbibing embryos after 12 h of seed soaking in the W, M, and MF3 solution, respectively, by using Trizol reagent (Invitrogen, Carlsbad, CA, USA). The quality and quantity of the total RNA were measured through electrophoresis on 1% agarose gel and an Agilent 2100 Bioanalyzer respectively (Agilent Technologies, Santa Rosa, USA). The TruSeq RNA Sample Preparation Kit was utilized to construct an RNA-Seq library from 5 µg of total RNA from each sample according to the manufacturer's instruction (Illumina, San Diego, CA, USA). Next-generation sequencing of nine libraries were performed through Novaseq 6000 platform (Illumina, San Diego, CA, USA).

    First of all, by using SeqPrep (https://github.com/jstjohn/SeqPrep) and Sickle (https://github.com/najoshi/sickle) the raw RNA-Seq reads were filtered and trimmed with default parameters. After filtering, high-quality reads were mapped onto the pea reference genome (https://urgi.versailles.inra.fr/Species/Pisum) by using TopHat (V2.1.0)[43]. Using Cufflinks, the number of mapped reads from each sample was determined and normalised to FPKM for each predicted transcript (v2.2.1). Pairwise comparisons were made between W vs M and W vs M+F treatments. The DEGs with a fold change ≥ 1.5 and false discovery rate (FDR) adjusted p-values ≤ 0.05 were identified by using Cuffdiff[44].

    qPCR was performed by using TOROGGreen® qPCR Master Mix (Toroivd, Shanghai, China) on a qTOWER®3 Real-Time PCR detection system (Analytik Jena, Germany). The reactions were performed at 95 °C for 60 s, followed by 42 cycles of 95 °C for 10 s and 60 °C for 30 s. Quantification of relative expression level was achieved by normalization against the transcripts of the housekeeping genes β-tubulin according to Kreplak et al.[35]. The primer sequences for reference and target genes used are listed in Supplemental Table S3.

    The homology-based analysis identifies 41 putative AQPs in the garden pea genome. Among them, all but two genes (Psat0s3550g0040.1, Psat0s2987g0040.1) encode full-length aquaporin-like sequences (Table 1). The conserved protein domain analysis later validated all of the expected AQPs (Supplemental Table S2). To systematically classify these genes and elucidate their relationship with the AQPs from other plants' a phylogenetic tree was created. It clearly showed that the AQPs from pea and its close relative M. truncatula formed four distinct clusters, which represented the different subfamilies of AQPs i.e. TIPs, PIPs, NIPs, and SIPs (Fig. 1a). However, out of the 41 identified pea AQPs, 4 AQPs couldn't be tightly aligned with the Medicago AQPs and thus were put to a new phylogenetic tree constructed with AQPs from rice, Arabidopsis, and soybean. This additional analysis assigned one of the 4 AQPs to the XIP subfamily and the rest three to the TIP or NIP subfamilies (Fig. 1b). Therefore, it is concluded that the 41 PsAQPs comprise 11 PsTIPs, 15 PsNIPs, 9 PsPIPs, 5 PsSIPs, and 1 PsXIP (Table 2). The PsPIPs formed two major subgroups namely PIP1s and PIP2s, which comprise three and six members, respectively (Table 1). The PsTIPs formed two major subgroups TIPs 1 (PsTIP1-1, PsTIP1-3, PsTIP1-4, PsTIP1-7) and TIPs 2 (PsTIP2-1, PsTIP2-2, PsTIP2-3, PsTIP2-6) each having four members (Table 2). Detailed information such as gene/protein names, accession numbers, the length of deduced polypeptides, and protein structural features are presented in Tables 1 & 2

    Table 1.  Description and distribution of aquaporin genes identified in the garden pea genome.
    Chromosome
    S. NoGene NameGene IDGene length
    (bp)
    LocationStartEndTranscription length (bp)CDS length
    (bp)
    Protein length
    (aa)
    1PsPIP1-1Psat5g128840.32507chr5LG3231,127,859231,130,365675675225
    2PsPIP1-2Psat2g034560.11963chr2LG149,355,95849,357,920870870290
    3PsPIP1-4Psat2g182480.11211chr2LG1421,647,518421,648,728864864288
    4PsPIP2-1Psat6g183960.13314chr6LG2369,699,084369,702,397864864288
    5PsPIP2-2-1Psat4g051960.11223chr4LG486,037,44686,038,668585585195
    6PsPIP2-2-2Psat5g279360.22556chr5LG3543,477,849543,480,4042555789263
    7PsPIP2-3Psat7g228600.22331chr7LG7458,647,213458,649,5432330672224
    8PsPIP2-4Psat3g045080.11786chr3LG5100,017,377100,019,162864864288
    9PsPIP2-5Psat0s3550g0040.11709scaffold0355020,92922,63711911191397
    10PsTIP1-1Psat3g040640.12021chr3LG589,426,47389,428,493753753251
    11PsTIP1-3Psat3g184440.12003chr3LG5393,920,756393,922,758759759253
    12PsTIP1-4Psat7g219600.12083chr7LG7441,691,937441,694,019759759253
    13PsTIP1-7Psat6g236600.11880chr6LG2471,659,417471,661,296762762254
    14PsTIP2-1Psat1g005320.11598chr1LG67,864,8107,866,407750750250
    15PsTIP2-2Psat4g198360.11868chr4LG4407,970,525407,972,392750750250
    16PsTIP2-3Psat1g118120.12665chr1LG6230,725,833230,728,497768768256
    17PsTIP2-6Psat2g177040.11658chr2LG1416,640,482416,642,139750750250
    18PsTIP3-2Psat6g054400.11332chr6LG254,878,00354,879,334780780260
    19PsTIP4-1Psat6g037720.21689chr6LG230,753,62430,755,3121688624208
    20PsTIP5-1Psat7g157600.11695chr7LG7299,716,873299,718,567762762254
    21PsNIP1-1Psat1g195040.21864chr1LG6346,593,853346,595,7161863645215
    22PsNIP1-3Psat1g195800.11200chr1LG6347,120,121347,121,335819819273
    23PsNIP1-5Psat7g067480.12365chr7LG7109,420,633109,422,997828828276
    24PsNIP1-6Psat7g067360.12250chr7LG7109,270,462109,272,711813813271
    25PsNIP1-7Psat1g193240.11452chr1LG6344,622,606344,624,057831831277
    26PsNIP2-1-2Psat3g197520.1669chr3LG5420,092,382420,093,050345345115
    27PsNIP2-2-2Psat3g197560.1716chr3LG5420,103,168420,103,883486486162
    28PsNIP3-1Psat2g072000.11414chr2LG1133,902,470133,903,883798798266
    29PsNIP4-1Psat7g126440.11849chr7LG7209,087,362209,089,210828828276
    30PsNIP4-2Psat5g230920.11436chr5LG3463,340,575463,342,010825825275
    31PsNIP5-1Psat6g190560.11563chr6LG2383,057,323383,058,885867867289
    32PsNIP6-1Psat5g304760.45093chr5LG3573,714,868573,719,9605092486162
    33PsNIP6-2Psat7g036680.12186chr7LG761,445,34161,447,134762762254
    34PsNIP6-3Psat7g259640.12339chr7LG7488,047,315488,049,653918918306
    35PsNIP7-1Psat6g134160.24050chr6LG2260,615,019260,619,06840491509503
    36PsSIP1-1Psat3g091120.13513chr3LG5187,012,329187,015,841738738246
    37PsSIP1-2Psat1g096840.13609chr1LG6167,126,599167,130,207744744248
    38PsSIP1-3Psat7g203280.12069chr7LG7401,302,247401,304,315720720240
    39PsSIP2-1-1Psat0s2987g0040.1706scaffold02987177,538178,243621621207
    40PsSIP2-1-2Psat3g082760.13135chr3LG5173,720,100173,723,234720720240
    41PsXIP2-1Psat7g178080.12077chr7LG7335,167,251335,169,327942942314
    bp: base pair, aa: amino acid.
     | Show Table
    DownLoad: CSV
    Figure 1.  Phylogenetic analysis of the identified AQPs from pea genome. (a) The pea AQPs proteins aligned with those from the cool-season legume Medicago truncatual. (b) The four un-assigned pea AQPs in (a) (denoted as NA) were further aligned with the AQPs of rice, soybean, and Arabidopsis by using the Clustal W program implemented in MEGA 7 software. The nomenclature of PsAQPs was based on homology with the identified aquaporins that were clustered together.
    Table 2.  Protein information, conserved amino acid residues, trans-membrane domains, selectivity filter, and predicted subcellular localization of the 39 full-length pea aquaporins.
    S. NoAQPsGeneLengthTMHNPANPAar/R selectivity filterpIWoLF PSORTPlant-mPLoc
    LBLEH2H5LE1LE2
    Plasma membrane intrinsic proteins (PIPs)
    1PsPIP1-1Psat5g128840.32254NPA0F0008.11PlasPlas
    2PsPIP1-2Psat2g034560.12902NPANPAFHTR9.31PlasPlas
    3PsPIP1-4Psat2g182480.12886NPANPAFHTR9.29PlasPlas
    4PsPIP2-1Psat6g183960.12886NPANPAFHT08.74PlasPlas
    5PsPIP2-2-1Psat4g051960.1195300FHTR8.88PlasPlas
    6PsPIP2-2-2Psat5g279360.22635NPANPAFHTR5.71PlasPlas
    7PsPIP2-3Psat7g228600.22244NPA0FF006.92PlasPlas
    8PsPIP2-4Psat3g045080.12886NPANPAFHTR8.29PlasPlas
    Tonoplast intrinsic proteins (TIPs)
    1PsTIP1-1Psat3g040640.12517NPANPAHIAV6.34PlasVacu
    2PsTIP1-3Psat3g184440.12536NPANPAHIAV5.02Plas/VacuVacu
    3PsTIP1-4Psat7g219600.12537NPANPAHIAV4.72VacuVacu
    4PsTIP1-7Psat6g236600.12546NPANPAHIAV5.48Plas/VacuVacu
    5PsTIP2-1Psat1g005320.12506NPANPAHIGR8.08VacuVacu
    6PsTIP2-2Psat4g198360.12506NPANPAHIGR5.94Plas/VacuVacu
    7PsTIP2-3Psat1g118120.12566NPANPAHIAL6.86Plas/VacuVacu
    8PsTIP2-6Psat2g177040.12506NPANPAHIGR4.93VacuVacu
    9PsTIP3-2Psat6g054400.12606NPANPAHIAR7.27Plas/VacuVacu
    10PsTIP4-1Psat6g037720.22086NPANPAHIAR6.29Vac/ plasVacu
    11PsTIP5-1Psat7g157600.12547NPANPANVGC8.2Vacu /plasVacu/Plas
    Nodulin-26 like intrisic proteins (NIPs)
    1PsNIP1-1Psat1g195040.22155NPA0WVF06.71PlasPlas
    2PsNIP1-3Psat1g195800.12735NPANPVWVAR6.77PlasPlas
    3PsNIP1-5Psat7g067480.12766NPANPVWVAN8.98PlasPlas
    4PsNIP1-6Psat7g067360.12716NPANPAWVAR8.65Plas/VacuPlas
    5PsNIP1-7Psat1g193240.12776NPANPAWIAR6.5Plas/VacuPlas
    6PsNIP2-1-2Psat3g197520.11152NPAOG0009.64PlasPlas
    7PsNIP2-2-2Psat3g197560.116230NPA0SGR6.51PlasPlas
    8PsNIP3-1Psat2g072000.12665NPANPASIAR8.59Plas/VacuPlas
    9PsNIP4-1Psat7g126440.12766NPANPAWVAR6.67PlasPlas
    10PsNIP4-2Psat5g230920.12756NPANPAWLAR7.01PlasPlas
    11PsNIP5-1Psat6g190560.12895NPSNPVAIGR7.1PlasPlas
    12PsNIP6-1Psat5g304760.41622NPA0I0009.03PlasPlas
    13PsNIP6-2Psat7g036680.1254000G0005.27ChloPlas/Nucl
    14PsNIP6-3Psat7g259640.13066NPANPVTIGR8.32PlasPlas
    15PsNIP7-1Psat6g134160.25030NLK0WGQR8.5VacuChlo/Nucl
    Small basic intrinsic proteins (SIPs)
    1PsSIP1-1Psat3g091120.12466NPTNPAVLPN9.54PlasPlas/Vacu
    2PsSIP1-2Psat1g096840.12485NTPNPAIVPL9.24VacuPlas/Vacu
    3PsSIP1-3Psat7g203280.12406NPSNPANLPN10.32ChloPlas
    4PsSIP2-1-2Psat3g082760.12404NPLNPAYLGS10.28PlasPlas
    Uncharacterized X intrinsic proteins (XIPs)
    1PsXIP2-1Psat7g178080.13146SPVNPAVVRM7.89PlasPlas
    Length: protein length (aa); pI: Isoelectric point; Trans-membrane helicase (TMH) represents for the numbers of Trans-membrane helices predicted by TMHMM Server v.2.0 tool; WoLF PSORT and Plant-mPLoc: best possible cellualr localization predicted by the WoLF PSORT and Plant-mPLoc tool, respectively (Chlo Chloroplast, Plas Plasma membrane, Vacu Vacuolar membrane, Nucl Nucleus); LB: Loop B, L: Loop E; NPA: Asparagine-Proline-Alanine; H2 represents for Helix 2, H5 represents for Helix 5, LE1 represents for Loop E1, LE2 represents for Loop E2, Ar/R represents for Aromatic/Arginine.
     | Show Table
    DownLoad: CSV

    To understand the genome distribution of the 41 PsAQPs, we mapped these genes onto the seven chromosomes of a pea to retrieve their physical locations (Fig. 2). The greatest number (10) of AQPs were found on chromosome 7, whereas the least (2) on chromosome 4 (Fig. 2 and Table 1). Chromosomes 1 and 6 each contain six aquaporin genes, whereas chromosomes 2, 3, and 5 carry four, seven, and four aquaporin genes, respectively (Fig. 2). The trend of clustered distribution of AQPs was seen on specific chromosomes, particularly near the end of chromosome 7.

    Figure 2.  Chromosomal localization of the 41 PsAQPs on the seven chromosomes of pea. Chr1-7 represents the chromosomes 1 to 7. The numbers on the right of each chromosome show the physical map positions of the AQP genes (Mbp). Blue, green, orange, brown, and black colors represent TIPs, NIPs, PIPs, SIPs, and XIP, respectively.

    The 39 full-length PsAQP proteins have a length of amino acid ranging from 115 to 503 (Table 1) and Isoelectric point (pI) values ranging from 4.72 to 10.35 (Table 2). As a structural signature, transmembrane domains were predicted to exist in all PsAQPs, with the number in individual AQPs varying from 2 to 6. By subfamilies, TIPs harbor the greatest number of TM domains in total, followed by PIPs, NIPs, SIPs, and XIP (Table 2). Exon-intron structure analysis showed that most PsAQPs (16/39) having two introns, while ten members had three, seven members had four, and five members had only one intron (Fig. 3). Overall, PsAQPs exhibited a complex structure with varying intron numbers, positions, and lengths.

    Figure 3.  The exon-intron structures of the AQP genes in pea. Upstream/downstream region, exon, and intron are represented by a blue box, yellow box, and grey line, respectively.

    As aforementioned, generally highly conserved two NPA motifs generate an electrostatic repulsion of protons in AQPs to form the water channel, which is essential for the transport of substrate molecules[15]. In order to comprehend the potential physiological function and substrate specificity of pea aquaporins, NPA motifs (LB, LE) and residues at the ar/R selectivity filter (H2, H5, LE1, and LE2) were examined. (Table 2). We found that all PsTIPs and most PsPIPs had two conserved NPA motifs except for PsPIP1-1, PsPIP2-2-1, and PsPIP2-3, each having a single NPA motif. Among PsNIPs, PsNIP1-6, PsNIP1-6, PsNIP1-7, PsNIP3-1, PsNIP4-1 and PSNIP4-2 had two NPA domains, while PsNIP1-1, PsNIP2-1-2, PsNIP2-2-2 and PsNIP6-1 each had a single NPA motif. In the PsNIP sub-family, the first NPA motif showed an Alanine (A) to Valine (V) substitution in three PsNIPs (PsNIP1-3, PsNIP1-5, and PsNIP6-3) (Table 2). Furthermore, the NPA domains of all members of the XIP and SIP subfamilies were different. The second NPA motif was conserved in PsSIP aquaporins, however, all of the first NPA motifs had Alanine (A) replaced by Leucine (L) (PsSIP2-1-1, PsSIP2-1-2) or Threonine (T) (PsSIP1-1). In comparison to other subfamilies, this motif variation distinguishes water and solute-transporting aquaporins[45].

    Compared to NPA motifs, the ar/R positions were more variable and the amino acid composition appeared to be subfamily-dependent. The majority of PsPIPs had phenylalanine at H2, histidine at H5, threonine at LE1, and arginine at LE2 selective filter (Table 2). All of the PsTIP1 members had a Histidine-Isoleucine-Alanine-Valine structure at this position, while all PsTIP2 members but PsTIP2-3 harbored Histidine-Isoleucine-Glycine-Arginine. Similarly, PsNIPs, PsSIPs and PsXIP also showed subgroup-specific variation in ar/R selectivity filter (Table 2). Each of these substitutions partly determines the function of transporting water[46].

    Sequence-based subcellular localization analysis using WoLF PSORT predicted that all PsPIPs localized in the plasma membrane, which is consistent with their subfamily classification (Table 2). Around half (5/11) of the PsTIPs (PsTIP1-4, PsTIP2-1, PsTIP2-6, PsTIP4-1, and PsTIP5-1) were predicted to localize within vacuoles. However, several TIP members (PsTIP1-1, PsTIP1-3, PsTIP1-7, PsTIP2-2, PsTIP2-3 and PsTIP3-2) were predicted to localize in plasma membranes. We then further investigated their localizations by using another software (Plant-mPLoc, Table 2), which predicted that all the PsTIPs localize within vacuoles, thus supporting that they are tonoplast related. An overwhelming majority of PsNIPs (14/15) and PsXIP were predicted to be found only in plasma membranes., which was also expected (Table 2). Collectively, the versatility in subcellular localization of the pea AQPs is implicative of their distinct roles in controlling water and/or solute transport in the context of plant cell compartmentation.

    Tissue expression patterns of genes are indicative of their functions. Since there were rich resources of RNA-Seq data from various types of pea tissues in the public database, they were used for the extraction of expression information of PsAQP genes as represented by FPKM values. A heat map was generated to show the expression patterns of PsAQP genes in 18 different tissues/stages and their responses to nitrate levels (Fig. 4). According to the heat map, PsPIP1-2, PsPIP2-3 were highly expressed in root and nodule G (Low-nitrate), whereas PsTIP1-4, PsTIP2-6, and PsNIP1-7 were only expressed in roots in comparison to other tissues. The result also demonstrated that PsPIP1-1 and PsNIP3-1 expressed more abundantly in leaf, tendril, and peduncle, whereas PsPIP2-2-2 and PsTIP1-1 showed high to moderate expressions in all the samples except for a few. Interestingly, PsTIP1-1 expression in many green tissues seemed to be oppressed by low-nitrate. In contrast, some AQPs such as PsTIP1-3, PsTIP1-7, PsTIP5-1, PsNIP1-5, PsNIP4-1, PsNIP5-1, and PsSIP2-1-1 showed higher expression only in the flower tissue. There were interesting developmental stage-dependent regulations of some AQPs in seeds (Fig. 4). For example, PsPIP2-1, PsPIP2-2-1, PsNIP1-6, PsSIP1-1, and PsSIP1-2 were more abundantly expressed in the Seed_12 dap (days after pollination;) tissue than in the Seed_5 dai (days after imbibition) tissue; reversely, PsPIP2-2-2, PsPIP2-4, PsTIP2-3, and PsTIP3-2 showed higher expression in seed_5 dai in compare to seed_12 dap tissues (Fig. 4). The AQP genes may have particular functional roles in the growth and development of the pea based on their tissue-specific expression.

    Figure 4.  Heatmap analysis of the expression of pea AQP gene expressions in different tissues using RNA-seq data (PRJNA267198). Normalized expression of aquaporins in terms of reads per kilobase of transcript per million mapped reads (RPKM) showing higher levels of PIPs, NIPs, TIPs SIPs, and XIP expression across the different tissues analyzed. (Stage A represents 7-8 nodes; stage B represents the start of flowering; stage D represents germination, 5 d after imbibition; stage E represents 12 d after pollination; stage F represents 8 d after sowing; stage G represents 18 d after sowing, LN: Low-nitrate; HN: High-nitrate.

    Expressions of plant AQPs in vegetative tissues under normal and stressed conditions have been extensively studied[15]; however, little is known about the transcriptional regulation of AQP genes in seeds/embryos. To provide insights into this specific area, wet-bench RNA-Seq was performed on the germinating embryo samples isolated from water (W)-imbibed seeds and those treated with mannitol (M, an osmotic reagent), mannitol, and mannitol plus fullerol (F, a nano-antioxidant). The phenotypic evaluation showed that M treatment had a substantial inhibitory effect on radicle growth, whereas the supplement of F significantly mitigated this inhibition at all concentrations, in particular, 100 mg/mL in MF3, which increased the radicle length by ~33% as compared to that under solely M treatment (Fig. 5). The expression values of PsAQP genes were removed from the RNA-Seq data, and pairwise comparisons were made within the Group 1: W vs M, and Group 2: W vs MF3, where a total of ten and nince AQPs were identified as differentially expressed genes (DEGs), respectively (Fig. 6). In Group 1, six DEGs were up-regulated and four DEGs down-regulated, whereas in Group 2, six DEGs were up-regulated and three DEGs down-regulated. Four genes viz. PsPIPs2-5, PsNIP6-3, PsTIP2-3, and PsTIP3-2 were found to be similarly regulated by M or MF3 treatment (Fig. 6), indicating that their regulation by osmotic stress couldn't be mitigated by fullerol. Three genes, all being PsNIPs (1-1, 2-1-2, and 4-2), were up-regulated only under mannitol treatment without fullerol, suggesting that their perturbations by osmotic stress were migrated by the antioxidant activities. In contrast, four other genes namely PsTIP2-2, PsTIP4-1, PsNIP1-5, and PsSIP1-3 were only regulated under mannitol treatment when fullerol was present.

    Figure 5.  The visual phenotype and radicle length of pea seeds treated with water (W), 0.3 M mannitol (M), and fullerol of different concentrations dissolved in 0.3 M mannitol (MF). MF1, MF2, MF3, and MF4 indicated fullerol dissolved in 0.3 M mannitol at the concentration of 10, 50, 100, and 500 mg/L, respectively. (a) One hundred and fifty grains of pea seeds each were used for phenotype analysis at 72 h after treatment. Radicle lengths were measured using a ruler in three replicates R1, R2, and R3 in all the treatments. (b) Multiple comparison results determined using the SSR-Test method were shown with lowercase letters to indicate statistical significance (P < 0.05).
    Figure 6.  Venn diagram showing the shared and unique differentially expressed PsAQP genes in imbibing seeds under control (W), Mannitol (M) and Mannitol + Fullerol (MF3) treatments. Up-regulation (UG): PsPIP2-5, PsNIP1-1, PsNIP2-1-2, PsNIP4-2, PsNIP6-3, PsNIP1-5, PsTIP2-2, PsTIP4-1, PsSIP1-3, PsXIP2-1; Down-regulation (DG): PsTIP2-3, PsTIP3-2, PsNIP1-7, PsNIP5-1, PsXIP2-1.

    As a validation of the RNA-Seq data, eight genes showing differential expressions in imbibing seeds under M or M + F treatments were selected for qRT-PCR analysis, which was PsTIP4-1, PsTIP2-2, PsTIP2-3, PsTIP3-2, PsPIP2-5, PsXIP2-1, PsNIP6-3 and PsNIP1-5 shown in Fig 6, the expression modes of all the selected genes but PsXIP2-1 were well consistent between the RNA-Seq and the qRT-PCR data. PsXIP2-1, exhibiting slightly decreased expression under M treatment according to RNA-Seq, was found to be up-regulated under the same treatment by qRT-PCR (Fig. 7). This gene was therefore removed from further discussions.

    Figure 7.  The expression patterns of seven PsAQPs in imbibing seeds as revealed by RNA-Seq and qRT-PCR. The seeds were sampled after 12 h soaking in three different solutions, namely water (W), 0.3 M mannitol (M), and 100 mg/L fullerol dissolved in 0.3 M mannitol (MF3) solution. Error bars are standard errors calculated from three replicates.

    This study used the recently available garden pea genome to perform genome-wide identification of AQPs[35] to help understand their functions in plant growth and development. A total of 39 putative full-length AQPs were found in the garden pea genome, which is very similar to the number of AQPs identified in many other diploid legume crops such as 40 AQPs genes in pigeon pea, chickpea, common bean[7,47,48], and 44 AQPs in Medicago[49]. On the other hand, the number of AQP genes in pea is greater compared to diploid species like rice (34)[4], Arabidopsis thaliana (35)[3], and 32 and 36 in peanut A and B genomes, respectively[8]. Phylogenetic analysis assigned the pea AQPs into all five subfamilies known in plants, whereas the presence of only one XIP in this species seems less than the number in other diploid legumes which have two each in common bean and Medicago[5,48,49]. The functions of the XIP-type AQP will be of particular interest to explore in the future.

    The observed exon-intron structures in pea AQPs were found to be conserved and their phylogenetic distribution often correlated with these structures. Similar exon-intron patterns were seen in PIPs and TIPs subfamily of Arabidopsis, soybean, and tomato[3,6,50]. The two conserved NPA motifs and the four amino acids forming the ar/R SF mostly regulate solute specificity and transport of the substrate across AQPs[47,51]. According to our analysis, all the members of each AQP subfamilies in garden pea showed mostly conserved NPA motifs and a similar ar/R selective filter. Interestingly, most PsPIPs carry double NPA in LB and LE and a hydrophilic ar/R SF (F/H/T/R) as observed in three legumes i.e., common bean[48], soybean[5] chickpea[7], showing their affinity for water transport. All the TIPs of garden pea have double NPA in LB and LE and wide variation at selectivity filters. Most PsTIP1s (1-1, 1-3, 1-4, and 1-7) were found with H-I-A-V ar/R selectivity filter similar to other species such as Medicago, Arachis, and common bean, that are reported to transport water and other small molecules like boron, hydrogen peroxide, urea, and ammonia[52]. Compared with related species, the TIPs residues in the ar/R selectivity filter were very similar to those in common bean[48], Medicago[49], and Arachis[8]. In the present study, the NIPs, NIP1s (1-3, 1-5, 1-6, and1-7), and NIP2-2-2 genes have G-S-G-R selectivity. Interestingly, NIP2s with a G-S-G-R selectivity filter plays an important role in silicon influx (Si) in many plant species such as Soybean and Arachis[6,8]. It was reported that Si accumulation protects plants against various types of biotic and abiotic stresses[53].

    The subcellular localization investigation suggested that most of the PsAQPs were localized to the plasma membrane or vacuolar membrane. The members of the PsPIPs, PsNIPs, and PsXIP subfamilies were mostly located in the plasma membrane, whereas members of the PsTIPs subfamily were often predicted to localize in the vacuolar membrane. Similar situations were reported in many other legumes such as common bean, soybean, and chickpea[5,7,48]. Apart from that, PsSIPs subfamily were predicted to localize to the plasma membrane or vacuolar membrane, and some AQPs were likely to localize in broader subcellular positions such as the nucleus, cytosol, and chloroplast, which indicates that AQPs may be involved in various molecular transport functions.

    AQPs have versatile physiological functions in various plant organs. Analysis of RNA-Seq data showed a moderate to high expression of the PsPIPs in either root or green tissues except for PsPIP2-4, indicating their affinity to water transport. In several other species such as Arachis[8], common bean[48], and Medicago[49], PIPs also were reported to show high expressions and were considered to play an important role to maintain root and leaf hydraulics. Also interestingly, PsTIP2-3 and PsTIP3-2 showed high expressions exclusively in seeds at 5 d after imbibition, indicating their specific roles in seed germination. Earlier, a similar expression pattern for TIP3s was reported in Arabidopsis during the initial phase of seed germination and seed maturation[54], soybean[6], canola[55], and Medicago[49], suggesting that the main role of TIP3s in regulating seed development is conserved across species.

    Carbon nanoparticles such as fullerol have a wide range of potential applications as well as safety concerns in agriculture. Fullerol has been linked to plant protection from oxidative stress by influencing ROS accumulation and activating the antioxidant system in response to drought[56]. The current study revealed that fullerol at an adequate concentration (100 mg/L), had favorable effects on osmotic stress alleviation. In this study, the radical growth of germinating seeds was repressed by the mannitol treatment, and many similar observations have been found in previous studies[57]. Furthermore, mannitol induces ROS accumulation in plants, causing oxidative stress[58]. Our work further validated that the radical growth of germinating seeds were increased during fullerol treatment. Fullerol increased the length of roots and barley seeds, according to Panova et al.[32]. Fullerol resulted in ROS detoxification in seedlings subjected to water stress[32].

    Through transcriptomic profiling and qRT-PCR, several PsAQPs that responded to osmotic stress by mannitol and a combination of mannitol and fullerol were identified. Most of these differentially expressed AQPs belonged to the TIP and NIP subfamilies. (PsTIP2-2, PsTIP2-3, and PsTIP 3-2) showed higher expression by mannitol treatment, which is consistent with the fact that many TIPs in other species such as GmTIP2;3 and Eucalyptus grandis TIP2 (EgTIP2) also showed elevated expressions under osmotic stress[54,59]. The maturation of the vacuolar apparatus is known to be aided by the TIPs, which also enable the best possible water absorption throughout the growth of embryos and the germination of seeds[60]. Here, the higher expression of PsTIP (2-2, 2-3, and 3-2) might help combat water deficiency in imbibing seeds due to osmotic stress. The cellular signals triggering such transcriptional regulation seem to be independent of the antioxidant system because the addition of fullerol didn’t remove such regulation. On the other hand, the mannitol-induced regulation of most PsNIPs were eliminated when fullerol was added, suggesting either a response of these NIPs to the antioxidant signals or being due to the mitigated cellular stress. Based on our experimental data and previous knowledge, we propose that the fullerol-induced up- or down-regulation of specific AQPs belonging to different subfamilies and locating in different subcellular compartments, work coordinatedly with each other, to maintain the water balance and strengthen the tolerance to osmotic stress in germinating pea seeds through reduction of ROS accumulation and enhancement of antioxidant enzyme levels. Uncategorized X intrinsic proteins (XIPs) Aquaporins are multifunctional channels that are accessible to water, metalloids, and ROS.[32,56]. Due likely to PCR bias, the expression data of PsXIP2-1 from qRT-PCR and RNA-Seq analyses didn’t match well, hampering the drawing of a solid conclusion about this gene. Further studies are required to verify and more deeply dissect the functions of each of these PsAQPs in osmotic stress tolerance.

    A total of 39 full-length AQP genes belonging to five sub-families were identified from the pea genome and characterized for their sequences, phylogenetic relationships, gene structures, subcellular localization, and expression profiles. The number of AQP genes in pea is similar to that in related diploid legume species. The RNA-seq data revealed that PsTIP (2-3, 3-2) showed high expression in seeds for 5 d after imbibition, indicating their possible role during the initial phase of seed germination. Furthermore, gene expression profiles displayed that higher expression of PsTIP (2-3, 3-2) in germinating seeds might help maintain water balance under osmotic stress to confer tolerance. Our results suggests that the biological functions of fullerol in plant cells are exerted partly through the interaction with AQPs.

    Under Bio project ID PRJNA793376 at the National Center for Biotechnology Information, raw data of sequencing read has been submitted. The accession numbers for the RNA-seq raw data are stored in GenBank and are mentioned in Supplemental Table S4.

    This study is supported by the National Key Research & Development Program of China (2022YFE0198000) and the Key Research Program of Zhejiang Province (2021C02041).

  • Pei Xu is the Editorial Board member of journal Vegetable Research. He was blinded from reviewing or making decisions on the manuscript. The article was subject to the journal's standard procedures, with peer-review handled independently of this Editorial Board member and his research group.

  • Supplemental Table S1 Kinase domain annotation of 967 pecan PKs.
    Supplemental Table S2 Family classification of pecan PKs with related information.
    Supplemental Table S3 Domain organization of pecan PKs.
    Supplemental Table S4 List of pecan protein kinases containing multiple kinase domains.
    Supplemental Table S5 GO IDs of pecan PKs.
    Supplemental Table S6 Segmental duplication events and Ka/Ks values of pecan protein kinases.
    Supplemental Table S7 FPKM values of pecan PK genes during embryo development.
    Supplemental Table S8 Genes in eight groups with different expression patterns during pecan embryo developmen.
    Supplemental Table S9 Average FPKM expression values of pecan PK genes under drought stress.
    Supplemental Table S10 Differentially expressed PK genes in six clusters with different expression patterns under drought stress.
    Supplemental Fig. S1 Phylogenetic classification of pecan PKs. The phylogenetic tree was generated with amino sequences of the kinase domain with maximum-likelihood method. PK subfamilies were highlighted with different colors.
    Supplemental Fig. S2 Phylogenetic classification of PKs in four different species. The phylogenetic tree was generated with amino sequences of the kinase domain from four different species (967 from pecan, 1006 from Arabidopsis, 1168 from grape and 758 from pineapple) with maximum-likelihood method.
    Supplemental Fig. S3 Expression of PK genes during embryo development in pecan. Log2 (FPKM+1) values were performed according to the red-white-blue color scale, and the heatmaps were generated using the R language with hierarchical clustering.
    Supplemental Fig. S4 Different expression patterns of pecan PK genes during embryo development.
    Supplemental Fig. S5 Expression data of PK subfamilies with drought treatment in pecan. Log2 (FPKM+1) values were performed according to the red-white-blue color scale, and the heatmaps were generated using R language with hierarchical clustering.
  • [1] Bennett J. 1991. Protein phosphorylation in green plant chloroplasts. Annual Review of Plant Physiology and Plant Molecular Biology 42:281−311 doi: 10.1146/annurev.pp.42.060191.001433

    CrossRef   Google Scholar

    [2] Stone JM, Walker JC. 1995. Plant protein kinase families and signal transduction. Plant Physiology 108:451−57 doi: 10.1104/pp.108.2.451

    CrossRef   Google Scholar

    [3] Champion A, Kreis M, Mockaitis K, Picaud A, Henry Y. 2004. Arabidopsis kinome: after the casting. Functional & Integrative Genomics 4:163−87 doi: 10.1007/s10142-003-0096-4

    CrossRef   Google Scholar

    [4] Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. 2002. The protein kinase complement of the human genome. Science 298:1912−34 doi: 10.1126/science.1075762

    CrossRef   Google Scholar

    [5] Hanks SK, Quinn AM, Hunter T. 1988. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science 241:42−52 doi: 10.1126/science.3291115

    CrossRef   Google Scholar

    [6] Lehti-Shiu MD, Shiu SH. 2012. Diversity, classification and function of the plant protein kinase superfamily. Philosophical Transactions of the Royal Society B - Biological Sciences 367:2619−39 doi: 10.1098/rstb.2012.0003

    CrossRef   Google Scholar

    [7] Liu J, Chen N, Grant JN, Cheng ZM, Stewart CN Jr, et al. 2015. Soybean kinome: functional classification and gene expression patterns. Journal of Experimental Botany 66:1919−34 doi: 10.1093/jxb/eru537

    CrossRef   Google Scholar

    [8] Zhu K, Wang X, Liu J, Tang J, Cheng Q, et al. 2018. The grapevine kinome: annotation, classification and expression patterns in developmental processes and stress responses. Horticulture Research 5:19 doi: 10.1038/s41438-018-0027-0

    CrossRef   Google Scholar

    [9] Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu SH. 2008. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiology 148:993−1003 doi: 10.1104/pp.108.122457

    CrossRef   Google Scholar

    [10] Lehti-Shiu MD, Zou C, Hanada K, Shiu SH. 2009. Evolutionary history and stress regulation of plant receptor-like kinase/pelle genes. Plant Physiology 150:12−26 doi: 10.1104/pp.108.134353

    CrossRef   Google Scholar

    [11] Dardick C, Chen J, Richter T, Ouyang S, Ronald P. 2007. The rice kinase database. A phylogenomic database for the rice kinome. Plant Physiology 143:579−86 doi: 10.1104/pp.106.087270

    CrossRef   Google Scholar

    [12] Gish LA, Clark SE. 2011. The RLK/Pelle family of kinases. The Plant Journal 66:117−27 doi: 10.1111/j.1365-313X.2011.04518.x

    CrossRef   Google Scholar

    [13] Zhu K, Fan P, Mo Z, Tan P, Feng G, et al. 2020. Identification, expression and co-expression analysis of R2R3-MYB family genes involved in graft union formation in pecan (Carya illinoinensis). Forests 11:917 doi: 10.3390/f11090917

    CrossRef   Google Scholar

    [14] Guo W, Chen J, Li J, Huang J, Wang Z, et al. 2020. Portal of Juglandaceae: A comprehensive platform for Juglandaceae study. Horticulture Research 7:35 doi: 10.1038/s41438-020-0256-x

    CrossRef   Google Scholar

    [15] Huang Y, Xiao L, Zhang Z, Zhang R, Wang Z, et al. 2019. The genomes of pecan and Chinese hickory provide insights into Carya evolution and nut nutrition. Gigascience 8:giz036 doi: 10.1093/gigascience/giz036

    CrossRef   Google Scholar

    [16] Panchy N, Lehti-Shiu M, Shiu SH. 2016. Evolution of gene duplication in plants. Plant Physiology 171:2294−316 doi: 10.1104/pp.16.00523

    CrossRef   Google Scholar

    [17] Cannon SB, Mitra A, Baumgarten A, Young ND, May G. 2004. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biology 4:10 doi: 10.1186/1471-2229-4-10

    CrossRef   Google Scholar

    [18] Zhu K, Chen F, Liu J, Chen X, Hewezi T, et al. 2016. Evolution of an intron-poor cluster of the CIPK gene family and expression in response to drought stress in soybean. Scientific Reports 6:28225 doi: 10.1038/srep28225

    CrossRef   Google Scholar

    [19] Chen X, Ding Y, Yang Y, Song C, Wang B, et al. 2021. Protein kinases in plant responses to drought, salt, and cold stress. Journal of Integrative Plant Biology 63:53−78 doi: 10.1111/jipb.13061

    CrossRef   Google Scholar

    [20] Ferreira-Neto JRC, Borges AN da C, da Silva MD, Morais DA de L, Bezerra-Neto JP, et al. 2021. The cowpea kinome: genomic and transcriptomic analysis under biotic and abiotic stresses. Frontiers in Plant Science 12:667013 doi: 10.3389/fpls.2021.667013

    CrossRef   Google Scholar

    [21] Zhu J. 2016. Abiotic stress signaling and responses in plants. Cell 167:313−24 doi: 10.1016/j.cell.2016.08.029

    CrossRef   Google Scholar

    [22] Bundó M, Coca M. 2017. Calcium-dependent protein kinase OsCPK10 mediates both drought tolerance and blast disease resistance in rice plants. Journal of Experimental Botany 68:2963−75 doi: 10.1093/jxb/erx145

    CrossRef   Google Scholar

    [23] Andrási N, Rigó G, Zsigmond L, Pérez-Salamó I, Papdi C, et al. 2019. The mitogen-activated protein kinase 4-phosphorylated heat shock factor A4A regulates responses to combined salt and heat stresses. Journal of Experimental Botany 70:4903−18 doi: 10.1093/jxb/erz217

    CrossRef   Google Scholar

    [24] Wei K, Wang Y, Zhong X, Pan S. 2014. Protein kinase structure, expression and regulation in maize drought signaling. Molecular Breeding 34:583−602 doi: 10.1007/s11032-014-0059-6

    CrossRef   Google Scholar

    [25] Zhu K, Liu H, Chen X, Cheng Q, Cheng ZM. 2018. The kinome of pineapple: catalog and insights into functions in crassulacean acid metabolism plants. BMC Plant Biology 18:199 doi: 10.1186/s12870-018-1389-z

    CrossRef   Google Scholar

    [26] Hindle MM, Martin SF, Noordally ZB, van Ooijen G, Barrios-Llerena ME, et al. 2014. The reduced kinome of Ostreococcus tauri: core eukaryotic signalling components in a tractable model species. BMC Genomics 15:640 doi: 10.1186/1471-2164-15-640

    CrossRef   Google Scholar

    [27] Zulawski M, Schulze G, Braginets R, Hartmann S, Schulze WX. 2014. The Arabidopsis Kinome: phylogeny and evolutionary insights into functional diversification. BMC Genomics 15:548 doi: 10.1186/1471-2164-15-548

    CrossRef   Google Scholar

    [28] Dievart A, Gottin C, Périn C, Ranwez V, Chantret N. 2020. Origin and diversity of plant receptor-like kinases. Annual Review of Plant Biology 71:131−56 doi: 10.1146/annurev-arplant-073019-025927

    CrossRef   Google Scholar

    [29] Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, et al. 2005. Modeling gene and genome duplications in eukaryotes. PNAS 102:5454−59 doi: 10.1073/pnas.0501102102

    CrossRef   Google Scholar

    [30] Zhang Z, Li J, Zhao X, Wang J, Wong GKC, et al. 2006. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics 4:259−63 doi: 10.1016/S1672-0229(07)60007-2

    CrossRef   Google Scholar

    [31] Hou J, Wei S, Pan H, Zhuge Q, Yin T. 2019. Uneven selection pressure accelerating divergence of Populus and Salix. Horticulture Research 6:37 doi: 10.1038/s41438-019-0121-y

    CrossRef   Google Scholar

    [32] Antolín-Llovera M, Ried MK, Binder A, Parniske M. 2012. Receptor kinase signaling pathways in plant-microbe interactions. Annual Review of Phytopathology 50:451−73 doi: 10.1146/annurev-phyto-081211-173002

    CrossRef   Google Scholar

    [33] Liang X, Zhou JM. 2018. Receptor-like cytoplasmic kinases: central players in plant receptor kinase–mediated signaling. Annual Review of Plant Biology 69:267−99 doi: 10.1146/annurev-arplant-042817-040540

    CrossRef   Google Scholar

    [34] Chandran AKN, Yoo YH, Cao P, Sharma R, Sharma M, et al. 2016. Updated Rice Kinase Database RKD 2.0: enabling transcriptome and functional analysis of rice kinase genes. Rice 9:40 doi: 10.1186/s12284-016-0106-5

    CrossRef   Google Scholar

    [35] Nodine MD, Yadegari R, Tax FE. 2007. RPK1 and TOAD2 are two receptor-like kinases redundantly required for Arabidopsis embryonic pattern formation. Developmental Cell 12:943−56 doi: 10.1016/j.devcel.2007.04.003

    CrossRef   Google Scholar

    [36] Li J. 2010. Multi-tasking of somatic embryogenesis receptor-like protein kinases. Current Opinion in Plant Biology 13:509−14 doi: 10.1016/j.pbi.2010.09.004

    CrossRef   Google Scholar

    [37] Wang R, Li L, Cao Z, Zhao Q, Li M, et al. 2012. Molecular cloning and functional characterization of a novel apple MdCIPK6L gene reveals its involvement in multiple abiotic stress tolerance in transgenic plants. Plant Molecular Biology 79:123−35 doi: 10.1007/s11103-012-9899-9

    CrossRef   Google Scholar

    [38] Meng D, Dong B, Niu L, Song Z, Wang L, et al. 2021. The pigeon pea CcCIPK14-CcCBL1 pair positively modulates drought tolerance by enhancing flavonoid biosynthesis. Plant Journal 106:1278−97 doi: 10.1111/tpj.15234

    CrossRef   Google Scholar

    [39] Lu L, Chen X, Wang P, Lu Y, Zhang J, et al. 2021. CIPK11: a calcineurin B-like protein-interacting protein kinase from Nitraria tangutorum, confers tolerance to salt and drought in Arabidopsis. BMC Plant Biology 21:123 doi: 10.1186/s12870-021-02878-x

    CrossRef   Google Scholar

    [40] Fujii H, Verslues PE, Zhu JK. 2011. Arabidopsis decuple mutant reveals the importance of SnRK2 kinases in osmotic stress responses in vivo. PNAS 108:1717−22 doi: 10.1073/pnas.1018367108

    CrossRef   Google Scholar

    [41] Gao L, Xue H. 2012. Global analysis of expression profiles of rice receptor-like kinase genes. Molecular Plant 5:143−53 doi: 10.1093/mp/ssr062

    CrossRef   Google Scholar

    [42] Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, et al. 2016. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Research 44:D279−D285 doi: 10.1093/nar/gkv1344

    CrossRef   Google Scholar

    [43] Eddy SR. 1998. Profile hidden Markov models. Bioinformatics 14:755−63 doi: 10.1093/bioinformatics/14.9.755

    CrossRef   Google Scholar

    [44] Letunic I, Khedkar S, Bork P. 2021. SMART: recent updates, new developments and status in 2020. Nucleic Acids Research 49:D458−D460 doi: 10.1093/nar/gkaa937

    CrossRef   Google Scholar

    [45] Katoh K, Rozewicki J, Yamada KD. 2019. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics 20:1160−66 doi: 10.1093/bib/bbx108

    CrossRef   Google Scholar

    [46] Price MN, Dehal PS, Arkin AP. 2009. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Molecular Biology and Evolution 26:1641−50 doi: 10.1093/molbev/msp077

    CrossRef   Google Scholar

    [47] Yu CS, Chen YC, Lu CH, Hwang JK. 2006. Prediction of protein subcellular localization. Proteins 64:643−51 doi: 10.1002/prot.21018

    CrossRef   Google Scholar

    [48] Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, et al. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389−402 doi: 10.1093/nar/25.17.3389

    CrossRef   Google Scholar

    [49] Wang Y, Li J, Paterson AH. 2013. MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans. Bioinformatics 29:1458−60 doi: 10.1093/bioinformatics/btt150

    CrossRef   Google Scholar

    [50] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947−48 doi: 10.1093/bioinformatics/btm404

    CrossRef   Google Scholar

    [51] Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, et al. 2020. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular Plant 13:1194−202 doi: 10.1016/j.molp.2020.06.009

    CrossRef   Google Scholar

    [52] Kim D, Langmead B, Salzberg SL. 2015. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12:357−60 doi: 10.1038/nmeth.3317

    CrossRef   Google Scholar

    [53] Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, et al. 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33:290−95 doi: 10.1038/nbt.3122

    CrossRef   Google Scholar

    [54] Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323 doi: 10.1186/1471-2105-12-323

    CrossRef   Google Scholar

    [55] Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550 doi: 10.1186/s13059-014-0550-8

    CrossRef   Google Scholar

    [56] Ernst J, Bar-Joseph Z. 2006. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 7:191 doi: 10.1186/1471-2105-7-191

    CrossRef   Google Scholar

    [57] Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 13:2498−504 doi: 10.1101/gr.1239303

    CrossRef   Google Scholar

  • Cite this article

    Zhu K, Fan P, Liu H, Zhao J, Tan P, et al. 2021. Pecan kinome: classification and expression analysis of all protein kinases in Carya illinoinensis. Forestry Research 1: 14 doi: 10.48130/FR-2021-0014
    Zhu K, Fan P, Liu H, Zhao J, Tan P, et al. 2021. Pecan kinome: classification and expression analysis of all protein kinases in Carya illinoinensis. Forestry Research 1: 14 doi: 10.48130/FR-2021-0014

Figures(7)

Article Metrics

Article views(6776) PDF downloads(1194)

ARTICLE   Open Access    

Pecan kinome: classification and expression analysis of all protein kinases in Carya illinoinensis

Forestry Research  1 Article number: 14  (2021)  |  Cite this article

Abstract: Protein kinases (PKs) are involved in plant growth and stress responses, and constitute one of the largest superfamilies due to numerous gene duplications. However, limited PKs have been functionally described in pecan, an economically important nut tree. Here, the comprehensive identification, annotation and classification of the entire pecan kinome are reported. A total of 967 PK genes were identified from the pecan genome, and further classified into 20 different groups and 121 subfamilies using the kinase domain sequences, which were verified by phylogenetic analysis. The receptor-like kinase (RLK) group contained 565 members, which constituted the largest group. Gene duplication contributed to the expansion of pecan kinome, 169 segmental duplication events including 285 PK genes were found, and the Ka/Ks ratio revealed they experienced strong negative selection. The RNA-Seq data of PK genes in pecan were further analyzed at the subfamily level, and different PK subfamilies performed various expression patterns across pecan embryo development or drought treatment, suggesting PK genes in pecan are involved in embryo development and drought stress response. Taken together, this study provides insight into the classification, expansion, evolution, and expression of pecan PKs. Our findings regarding expansion, expression and co-expression analyses lay a good foundation for future research to understand the roles of pecan PKs, and more efficiently determine the key candidate genes.

    • Reversible phosphorylation is a common type of post-translational modification, which is catalyzed by protein kinases (PKs), widely existing in living organisms[1]. PKs regulate the activity of downstream target proteins via transferring the phosphates to phosphorylate specific amino acids including serine, threonine or tyrosine as molecular switches[2]. PKs constitute a super gene family with a large number of members in plants, and the entire PKs in a genome are defined as the kinome. More than 1000 PK genes were found in Arabidopsis, representing about 4% of the genome[3]. However, only 518 putative PKs were identified in the human genome, which constitutes 1.7% of entire human genes[4].

      In general, PKs have a catalytic domain ranging from 250 to 300 amino acid residues. This superfamily was first classified into various subfamilies based on the phylogenetic analysis of the catalytic domain sequences[5]. In recent years, hundreds of plant genome sequences have been released, providing an excellent opportunity in the understanding of the evolutionary history of plant PKs. Kinomes from 25 plant species were identified and further classified into nine major groups with 115 families, and the PKs experienced huge expansion in flowering plants[6]. In soybean, 2,166 putative PKs were found, and divided into 19 groups and 122 subfamilies[7]. In the grapevine kinome, 1,168 PK genes were classified into 20 main groups and 121 subfamilies, the RLK-Pelle was the largest group with 872 PKs[8]. The huge expansion of kinome in flowering plants is due to gene duplication and a good retention rate of duplicates in some groups, especially the RLK-Pelle group[9]. Only four Interleukin Receptor-Associated Kinase (IRAK) genes have been found in the human genome, which perform a close relationship with plant RLK-Pelle group[10].

      Functional characterization studies of plant PK genes have mainly occurred in model plants such as Arabidopsis and rice, and PKs have been proven to play key roles involved in various biological processes[3, 11, 12]. However, few PK genes have been functionally analyzed in non-model plants, especially in perennial woody plants.

      The pecan tree [Carya illinoinensis (Wangenh.) K. Koch] is a well known commercially cultivated nut tree worldwide, which is native to North America and Mexico[13]. Pecan is a member of the Juglandaceae family in the genus Carya, and the delicious nuts are a good source of unsaturated fatty acids, flavonoids and protein for human benefit seeing an increase in consumption in recent years[14]. In 2018, the United States of America, produced over 130,000 tons of pecan nuts, with a total production value approaching $600 million (https://www.nass.usda.gov). Recently, the release of the pecan genome and transcriptome data has allowed characterization of the pecan kinome, duplication events, and their expression patterns under different conditions[15]. In the current study, 967 pecan PKs were identified and further classified into different groups and subfamilies. Conserved domain sequence features and phylogenetic relationships of different subfamily members were also evaluated. Subsequently, the expression patterns and co-expression networks of various subfamilies were analyzed to more efficiently determine the key members. Collectively, the comprehensive annotation of pecan PK genes and expression files helps us to understand the potential roles of pecan protein kinases.

    • All pecan protein sequences were aligned against the kinase domains by HMMER, and a total of 1,112 candidates were identified following exclusion of redundant sequences. The coverage of kinase domains of 1,112 protein alignments were then evaluated, and 967 were identified as typical PKs which contained at least 50% of the domain alignments (Supplemental Table S1)[6]. These pecan PKs were classified into different groups and families using HMM search method, and 11 were found to provide a different result through phylogenetic analysis (Supplemental Fig. S1). The 11 PK genes also had low E-values, were not clustered with any of the known PK subfamilies, and placed in an unclassified cluster (named as 'UNKNOWN'). The remaining 956 pecan PKs were further classified into 121 subfamilies, corresponding to 20 groups (Supplemental Fig. S1). The receptor-like kinase (RLK) group contained 59 subfamilies and 565 members, which accounted for 58.4% and comprised the largest group in pecan kinome. The other six major groups included CAMK (94), CMGC (87), TKL (65), STE (46), AGC (39), CK1 (17). Similar to the groups, the size of the subfamilies was also greatly variable and varied from one to 64 genes (Fig. 1).

      Figure 1.  Phylogenetic analyses and classification of PKs identified in the pecan genome. The phylogenetic tree of the 967 PKs in pecan was constructed by kinase domain sequences and classified into 121 subfamilies. Branches were colored to represent two major clades, the RLK clade was marked as orange, and the non-RLK clade was marked as grey.

      To gain insight into the evolutionary relationships of the PK families, a phylogenetic tree was built using the kinase domain sequences from four plant species including Arabidopsis, pecan, grape, and pineapple genomes (Supplemental Fig. S2). The pecan was phylogenetically closer to grape and Arabidopsis, the three species belonged to the dicotyledons, and the pineapple was a monocotyledon. Twenty-three PK subfamilies in pecan contained one member, examples being: CMGC_CDK-CDK8, CMGC_PI-Tthe, PEK_GCN2, RLK-Pelle_C-LEC, RLK-Pelle_RLCK-VIIb. Interestingly, these subfamilies were also highly conserved in grape, Arabidopsis and pineapple kinomes, suggesting the expansion of these subfamilies was limited. TKL-Pl-8 was only found in pineapple and grape, and absent in pecan and Arabidopsis, while the SCY1_SCYL1 subfamily was absent in pecan and pineapple kinomes. The RLK-Pelle_DLSV was the largest subfamily in all four kinomes, 158 subfamily members were identified in grape, while only 64 and 41 were found in pecan and pineapple kinomes, respectively. The RLK-Pelle_LRR-XI-1 was comprised of 52 members in pecan, whereas Arabidopsis and pineapple contained 33 and 27 members respectively.

    • The identified PK proteins consisted of 149−1,634 amino acids, and the predicted molecular weight (MW) values varied from 17.24 kDa (CIL0895S0070) to 180.1 kDa (CIL1226S0042). The theoretical isoelectric points (pI) of the PK proteins ranged from 4.49 to 9.85, indicating they might function in various microenvironments.

      To analyze the structural diversity of the pecan PKs in various subfamilies, the intron numbers of PK genes were collected. The intron numbers of genes in the pecan kinome varied widely from 0 to 30, with 127 being intronless, while 205 (21.2%) of them possessed at least ten introns. The average intron number of the 967 pecan kinase genes was six, and CIL1158S0028 contained the largest number of introns (Supplemental Table S2). After comparing the exon/intron arrangement in various subfamilies, we found that intron numbers in nine subfamilies were highly conserved. For example, all nine genes in RLK-Pelle_CR4L subfamily were intronless, and all RLK-Pelle_LRR-XIIIb members contained 26 introns. However, the gene structure of PK genes in some subfamilies were highly variable; for example, 13 CAMK_CAMKL-CHK1 subfamily members contained more than nine introns, whereas the intron numbers of the 17 remaining members were less than two.

      The subcellular location information can also be used to predict gene functions, and the subcellular localizations of pecan PK proteins were predicted according to CELLO software (Supplemental Table S2). Based on the results obtained, we found 30.8% (298/967) of PKs in pecan were predicted to localize to the plasma membrane, and most of them (280) were members of RLK groups. Intriguingly, 71.8% of PKs in the AGC group were localized to the nucleus, while 73.4% of members in the CAMK group were localized to the cytoplasm. Over half of PK genes in the CK1 group were localized to the mitochondria, and about 50% of CMGC and TKL members were predicted to localize to the nucleus (Fig. 2). Members in only 18 subfamilies were predicted to have the same subcellular locations, however, other PK genes within the same subfamilies were localized to different cellular compartments (Supplemental Table S2).

      Figure 2.  Subcellular localizations of pecan PK genes in different groups predicted by CELLO.

      Conserved domains of 967 PK proteins in pecan were detected, and about half of them (489/967) only contained one kinase catalyst (Supplemental Table S3). The remaining 487 PKs with additional domains were investigated, and found in AGC (82.05%), TKL (72.31%), CAMK (63.83%), RLK (54.34%), STE (26.09%), and CMGC (10.34%) groups, indicating that different groups performed multiple domain compositions. Members in each subfamily commonly showed similar domain organizations, for example, all RLK-Pelle_L-LEC members contained an additional Lectin_legB domain, suggesting that they might share a common evolutionary history. In total, 43 PKs were identified from 16 subfamilies, which had two conserved kinase domains, including 20 AGC, 15 RLK and six CMGC group members (Supplemental Table S4).

    • Three main gene ontology (GO) categories include biological processes (BP), molecular functions (MF) and cellular components (CC), GO analysis can help to predict the various functions of PK proteins. Therefore, the GO annotations of 967 PKs in the pecan kinome were investigated (Fig. 3 and Supplemental Table S5), it was found that these PKs were involved in the three GO categories. The largest fractions of the GO terms (43.56%) were related to molecular functions, and 34.05% were associated with biological processes, while only 22.4% were involved in cellular components (Fig. 3a).

      Figure 3.  Gene ontology (GO) analyses of pecan PKs. (a) Pie charts indicated the relative proportion of the GO terms in the pecan kinome. (b) Detailed annotations in the different biological process (BP), cellular component (CC), and molecular function (MF) categories.

      Functional GO terms for the 967 pecan PKs were assessed, and the top eleven GO terms identified in more than 100 PKs were listed (Fig. 3b). According to the BP results, 71.98% PKs (696) in pecan were associated with cellular protein modification process (GO:0006464), 221 and 249 were related to protein phosphorylation (GO:0006468) and signal transduction (GO:0007165), respectively, suggesting most PKs participated in various biological processes by modifying protein functions. The top four GO terms in the MF category included kinase activity (GO:0016301), ion binding (GO:0043167), ATP binding (GO:0005524), and protein serine/threonine kinase activity (GO:0004674). Additionally, in the CC category, the GO terms were related to different cellular components including plasma membrane (GO:0005886), cytoplasm (GO:0005737), and nucleus (GO:0005634), which was consistent with the previous results of subcellular localization prediction of PKs.

    • Gene duplications functioned in the expansion of the pecan kinome, and the gene copies generated by duplication contributed largely to the evolution of novel functions and environmental adaptation[16]. Segmental duplications occur frequently in higher plants since most of them are diploidized polyploids, and retain multiple duplicated chromosomal blocks within the genomes[17]. The segmental duplication events in the pecan kinome were investigated using MCScanX, and 169 duplication events with 285 PK genes from 63 subfamilies were found, 29.47% of PK members were evolved by segmental duplication, suggesting segmental duplication contributed to the expansion of the pecan kinome (Supplemental Table S6). Among the 285 PK genes, 145 were RLK group members, indicating 25.66% of the RLK group members resulted from segmental duplication. Moreover, 58.82% and 44.68% of the CK1 and CAMK group members resulted from segmental duplication, respectively.

      The non-synonymous substitutions (Ka), and synonymous substitutions (Ks) of the 169 duplication events were calculated, and Ks was applied as a time indicator to evaluate the relative date of duplication blocks (Supplemental Table S6). Among the segmental duplication events in the pecan kinome, the distribution of Ks values showed that the Ks ranged from 0.2 to 3.6, and peaked at 0.3 to 0.4 (Fig. 4a). Intriguingly, 70.41% of the frequency of Ks values were less than 0.5, indicating recent duplications played an important role in the expansion of the pecan kinome. The Ka/Ks values of segmentally duplicated gene pairs were further analyzed to determine the selection pressures influencing sequence divergence. A value of Ka/Ks > 1 represents positive selection, a value of Ka/Ks < 1 represents negative selection, whereas a value of Ka/Ks = 1 indicates neutral selection. Among these duplication events tested, the Ka/Ks values ranged from 0.043 to 0.46, suggesting these PK genes have experienced strong negative selection (Fig. 4b).

      Figure 4.  Distribution of relative Ks (a), and Ka/Ks (b) frequency among segmental duplication events in the pecan kinome.

    • The kernels of pecan nuts are nutritious with a high economic value. To investigate the expression patterns of PK genes in the developing pecan embryo, the expression data of 967 pecan PK genes through three typical stages, PEY1 (early stage), PEY2 (stage with fully extended cotyledons) and PEY3 (fully matured stage) of embryonic development were retrieved (Supplemental Table S7)[15]. According to the hierarchical clustering results of PK genes during embryo development, we found that genes in different PK subfamilies commonly performed various expression patterns (Supplemental Fig. S3). About one-third of PK genes showed very low expression levels in all three stages, such as CIL1226S0040 (CAMK_CDPK), CIL0942S0004 (STE_STE11), CIL1211S0038 (WNK_NRBP), and CIL0895S0070 (RLK-Pelle_DLSV). We also found the majority of the PK genes with low expression levels among pecan embryo development stages were the RLK group members. In contrast, some genes including CIL1119S0056, CIL0940S0189, CIL1575S0001, CIL1611S0012 in the CAMK group and CIL0902S0027, CIL0893S0285, CIL1032S0081 in the CMGC group showed high expression levels in all stages tested. Many other PK genes performed specific expression patterns in different stages.

      According to the expression levels of PK genes during pecan embryo development, we found the PK genes performed various expression patterns. These genes were then divided into eight clusters based on the expression patterns at three stages during embryo development (Supplemental Fig. S4), and three major clusters (cluster 0, 1, and 3) contained more than 100 members (Supplemental Table S8). Cluster 0 had the most PK genes (222) among the eight clusters, and the expression levels of genes in cluster 0 were gradually decreased during embryo development, however, 14 PK genes in cluster 7 were gradually increased.

      To further investigate the relationship between different pecan PK families during embryo development, the expression data of PK genes in each subfamily were averaged and a heatmap with clustering analysis was created (Fig. 5). According to the expression analysis at the subfamily level, these 121 PK gene subfamilies performed distinct expression patterns during embryo development. Some subfamilies such as CAMK_OST1L, CAMK_AMPK, CMGC_RCK, CMGC_CK2 showed high expression levels in all three representative stages. In contrast, several subfamilies in the RLK group including RLK-Pelle_RLCK-XII-2, RLK-Pelle_WAK, RLK-Pelle_LRR-Xb-2, RLK-Pelle_XIII presented low expression levels, which is consistent with previous results (Supplemental Fig. S3). Surprisingly, the remaining subfamilies in the RLK group were highly expressed in the PEY1 stage and down-regulated, indicating they might negatively regulate the embryo development process.

      Figure 5.  Expression analysis of the pecan PK subfamilies during embryo development. The expression data of different PK subfamilies collected from three representative periods of embryonic development of pecan including the early period (PEY1), the period with fully extended cotyledons (PEY2), and the fully matured period of the embryos (PEY3). Log2 (FPKM+1) values were performed according to the red-white-blue color scale. The heatmap was generated using the R package pheatmap with hierarchical clustering.

    • Protein kinases commonly play essential roles in response to environmental stresses including drought stress[10, 18, 19]. In order to explore the expression levels of pecan PK genes under drought stress, the RNA-Seq data were retrieved with FPKM values. In total, the expression data of 952 available genes in response to drought treatment were collected (Supplemental Table S9). The results indicated that PK subfamilies showed different expression patterns in response to drought (Supplemental Fig. S5). About half of the subfamilies in the RLK group exhibited low expression levels. By contrast, most subfamilies in AGC, CAMK, CMGC, CK1, and TKL groups were highly expressed, indicating these subfamily members may play essential roles in response to drought stress.

      To investigate the mutual relationships between pecan PK subfamilies under drought treatment, the co-expression networks were constructed (Fig. 6). The networks contained 112 nodes (PK subfamilies) and 690 edges (co-expression events) with one main network and one subnetwork. The main network had 109 nodes and 688 edges, and each node harbored a different number of edges varying from 1 to 31. Among these PK subfamilies, 30 had more than 20 edges, four subfamilies including AGC_RSK-2, CMGC_MAPK, RLK-Pelle_LRR-XIIIa and RLK-Pelle_RLCK-II had the maximum number of regulatory edges, and were considered as central hubs in the co-expression networks. According to the co-expression events, 421 showed significantly positive correlations, while the remaining 269 showed significantly negative correlations.

      Figure 6.  Co-expression networks of pecan PK subfamilies in response to drought. Each node indicated pecan PK subfamilies, and the edges indicated significant co-expression between subfamilies with a PCC of at least 0.9 (p < 0.01). Blue-colored edges indicate negative correlations, and red-colored edges indicate positive correlations.

      In total, 589 PK genes in the pecan kinome were identified as differentially expressed genes (DEGs) (| log2 (fold change) | ≥ 1, FDR < 0.05). Among them, 257 having a FPKM value < 10 at all time points were considered as lowly expressed genes; 332 PK genes had a FPKM value ≥ 10 in at least one time point. To analyze the expression patterns of pecan PK genes in response to drought stress, the 332 DEGs were further divided into clusters after filtering the lowly expressed PK genes (Fig. 7). Six different clusters with similar expression patterns were performed and members in different clusters ranged from 1 (cluster 3) to 159 (cluster 5), the detailed PK genes in each cluster are shown in Supplemental Table S10. Cluster 5 contained the most numbers of PK genes and the expression levels were gradually increasing under drought stress, however, genes in cluster 0 were gradually decreasing. Interestingly, some DEGs in a subfamily showed similar expression patterns. For example, eight genes in CAMK_CAMKL-CHK1, 11 genes in CAMK_CDPK, 22 genes in RLK-Pelle_DLSV, and seven members in STE_STE11 were all gradually up-regulated in cluster 5, suggesting these PK subfamily members might function in response to drought stress.

      Figure 7.  Temporal changes of differentially expressed PK genes under drought stress in pecan. One-year-old grafted 'Pawnee' trees were subjected to drought by withholding water for 0, 3, 6, 9, 12, and 15 d. Leaf samples were collected and used for RNA-Seq experiments. Expression data of 332 DEGs were retrieved and clustered into six clusters. The PK genes of each cluster are also listed in Supplemental Table S10.

    • Reversible phosphorylation, performed by PKs, is one of the most crucial post-translational modifications, and involved in multiple cellular processes[19, 20]. Although functional analysis of some PKs has been discovered in model plants including Arabidopsis and rice[2123], few PKs have been well understood in woody plants due to limited genome information. The recent release of the Carya illinoinensis genome sequence, an economically important nut tree cultivated worldwide, provides the chance to characterize and understand the regulatory networks of the pecan kinome. In the present research, 967 putative pecan PKs were identified using bioinformatic methods (Supplemental Fig. S1), which accounted for 3.11% (967/31,075) of protein-coding genes in the pecan genome[15]. This proportion of PKs in pecan was lower than that in soybean (4.7%), rice (4.1%), maize (3.8%), and Arabidopsis (3.4%), while higher than that of pineapple (2.8%)[6, 7, 24, 25]. The classification of PKs from 25 plant species showed that gene numbers ranged from 326 to 2535, and the kinome size was significantly larger in the flowering plants, while two algae species including Chlamydomonas reinhardtii and Volvox carteri had 503 and 326 PKs, respectively[6]. Ostreococcus tauri, a unicellular species of green alga, only possessed 133 PKs in its genome, amounting to 1.7%[26].

      Plant kinomes were commonly categorized into different groups and families based on the sequence difference of the kinase domain. The pecan kinome was divided into 20 different groups, and the RLK group was found to be the largest, containing more than half of the members (565) in the pecan kinome (Fig. 1), a similar phenomenon was also found in other flowering plants including Arabidopsis, grapevine, and rice (Supplemental Fig. S2)[6, 8, 27]. Interestingly, Chlamydomonas reinhardtii and Volvox carteri contained only two and three members in the RLK group, respectively. The large numbers of PK genes in flowering plants can be mainly attributed to the dramatic expansion of a few PK groups, especially the RLK group[28]. The number of subfamilies in the pecan kinome (121) was larger than that in pineapple (116), and similar to the soybean kinome (122)[7, 25].

      Duplication contributes to the evolution of novel gene functions including stress adaptation, disease resistance, and also makes major contributions to the large size of the RLK group in higher plants[16]. Over 90% of the increase in regulatory genes was caused by gene duplication in the Arabidopsis lineage[29]. In the pecan kinome, 29.47% (285/967) of the PK genes with 169 gene pairs were generated from segmental duplication, 145 of them were RLK genes and separated into 34 subfamilies (Supplemental Table S6), 10,530 paralogous pairs were found in the pecan genome[15]. Different families in the RLK group showed various expansion patterns, large families such as LRR and RLCK make important contributions to the expansion of the large size of the RLK group. Sixty-five and 49 PKs in LRR and RLCK families were generated from gene duplication, respectively, which is consistent with the previous results found in soybean[7]. The distribution of Ks values can be used to estimate the evolutionary date, more than 70% of duplicated genes in the pecan kinome occurred more recently (Fig. 4a). The ratio of Ka/Ks was commonly used to detect the history of selection pressure on coding sequences of duplicated genes[30]. In this study, Ka/Ks values of the 169 duplication events in the pecan kinome were less than 0.05, strong negative selection drove the evolution of the PKs in pecan (Fig. 4b). In a previous study, negative selection was also found to be the primary influence on PK genes in pineapple, negative selection indicated the process of removing deleterious mutations[31].

      PKs were generally related to the transmission of extracellular signals to the nucleus by activating or repressing target proteins, and subcellular localization information of PKs might help to explain protein's function[32]. We predicted the subcellular localization data of PKs in different groups, and about half of the RLK group members were located in the plasma membrane (Fig. 2), however, only 7% of PKs in RLCK families were membrane-located due to the absence of extracellular ligand-binding domains[33]. PKs in the non-RLK clade showed different subcellular localization features, such as most AGC group members were nucleus-located and more than 70% of CAMK group members were localized in the cytoplasm, similar results were also found in the pineapple kinome[25].

      Plant PKs, especially calcium-dependent protein kinases (CDPKs), mitogen-activated protein kinase (MAPK) cascades, sucrose non-fermenting1-related protein kinases (SnRKs), and RLKs have been well investigated and functionally analyzed in some model plants and crops[18, 19, 33]. To find the key genes more efficiently from the rice kinome with thousands of members, the rice kinase database (RKD) with PK genes in various tissues, under abiotic and biotic stresses was built[34]. However, limited expression information of PK genes is available for pecan. Expression levels might provide evidence of gene function, then RNA-Seq data of pecan PK genes were analyzed to obtain the central candidates during embryo development or response to drought stress. The expression patterns of pecan PK subfamilies during embryo development revealed many RLK subfamilies were down-regulated, especially some LRR subfamilies (Fig. 5), and this family has been found to play a role in embryo formation[35, 36].

      Drought stress could seriously impact food and energy security, and PKs have key functions in response to abiotic stresses including drought[21]. The expression data of PK subfamilies in pecan were analyzed under drought stress, while half of the RLK subfamilies performed low expression levels (Supplemental Fig. S5), these subfamilies also showed low expression in soybean and grapevine in response to drought[7, 8]. Furthermore, the differentially expressed genes in the pecan kinome were selected and divided into six clusters based on their different expression patterns, Cluster 5 contained 159 PK genes which were increased under drought stress, including three subfamilies such as CAMK_CAMKL-CHK1, CAMK_CDPK, and CAMK_OST1L in the CAMK group (Fig. 7). The CAMK_CAMKL-CHK1 subfamily, known as CBL-interacting protein kinase (CIPK), was involved in the drought stress response[21]. MdCIPK6L was up-regulated under drought stress, and the overexpression plants remarkably enhanced the tolerance to drought stress[37]. CcCBL1-CcCIPK14 module positively regulated drought tolerance via enhancing flavonoid biosynthesis in pigeon pea[38]. NtCIPK11 was up-regulated significantly in Nitraria tangutorum after mannitol treatment, and overexpression lines in Arabidopsis improved both drought and salt tolerance[39]. CDPK and CAMK_OST1L (named as SnRK2) genes have been proved to function in plant drought stress response[19, 40]. Among the 159 members in Cluster 5, 95 were RLK group genes and distributed in 28 subfamilies, which accounted for 59.75% (Supplemental Table S10). The receptor-like kinases activate the downstream signaling pathway via perceiving the extracellular signals and phosphorylating the targets, and drought stress caused the most notable effect on rice RLKs[41]. Intriguingly, nearly one-third of the genes in the largest subfamily, RLK-Pelle_DLSV, were found in Cluster 5, indicating this subfamily may play a key role in response to drought stress.

    • Plant protein kinases are important regulators of a variety of cellular processes including plant development and stress responses. In this study, a total of 967 PKs were annotated in the pecan genome, and divided into 121 subfamilies with 20 groups. Gene duplication functioned in the expansion of the pecan kinome, and the segmentally duplicated events suffered strong negative selection based on the Ka/Ks ratios. Moreover, different PK subfamilies in the pecan kinome performed dynamic transcript abundance during embryo development. In addition, pecan PK genes presented various expression patterns in response to drought, and most of them were differentially expressed. This research provides valuable information concerning pecan PKs, and lays a good foundation for further functional investigation of these genes during embryo development and drought stress responses.

    • All pecan protein and nucleotide sequences were downloaded from the GigaScience database (http://gigadb.org/dataset/100571)[15]. To uncover all the protein kinases in the pecan genome, Hidden Markov Models (HMMs) of the protein kinase clan including Pkinase (PF00069) and Pkinase_Tyr (PF07714) were downloaded from the Pfam website (http://pfam.xfam.org)[42]. HMMER software version 3.1b2 was used to investigate putative PKs, with an e-value cutoff of 1e-5[43]. Each candidate PK gene was further verified with the existence of the kinase domain using SMART (http://smart.embl-heidelberg.de)[44]. The putative PK was considered as a typical protein kinase if the domain alignments covered at least 50% of the kinase domain models[6].

      All identified protein kinases in Carya illinoinensis (pecan kinome) were classified into various groups, families, and subfamilies by HMMs constructed from a previous classification of 25 plant kinomes[6].

    • The kinase catalytic domain sequences of pecan PK proteins were retrieved using a perl script. Multiple sequence alignment was performed using MAFFT version 7 with the G-INS-I strategy (https://mafft.cbrc.jp/alignment/software)[45]. A Maximum-likelihood tree was constructed with the domain sequences using FastTree version 2.1 with the default setting (http://www.microbesonline.org/fasttree) to verify pecan kinome classification results[46].

    • Physical properties of the pecan PK proteins including molecular weight (MW), isoelectric points (pIs), and grand average of hydropathicity (GRAVY) were collected using online ExPASy ProtParam server (https://web.expasy.org/protparam).

    • To investigate the potential function of PKs in various cellular organelles, protein subcellular localization was predicted using CELLO v2.5 (http://cello.life.nctu.edu.tw)[47].

    • Intron numbers of all pecan PK genes were collected from the General Feature Format (gff) file from the GigaDB[15]. To analyze the domain organization patterns of the PKs, the Pfam database was used to identify the conserved domains according to the protein sequences of PKs with an e-value threshold of 1e-5.

    • OmicsBox software version 1.4 (https://www.biobam.com/omicsbox) was applied to analyze the Gene Ontology (GO) functional information. The annotations of GO terms were collected from Gene Ontology Consortium (http://geneontology.org).

    • All of the pecan PK sequences were searched against themselves by NCBI-BLAST 2.7.1+[48]. Then, segmental duplication events within the pecan kinome were investigated using Multiple Collinearity Scan toolkit (MCScanX) according to the manual[49].

    • The coding sequences (CDS) of the PK genes of duplication events were aligned with ClustalW[50]. To investigate the selection pressure of duplicate events, the non-synonymous substitutions (Ka) and synonymous substitutions (Ks) were calculated using TBtools software version 1.0971[51]. Ks values were further used to determine the date of duplication events, and the Ka/Ks ratios revealed the selection pressure of duplication events[25].

    • Publicly available transcriptome datasets were used to investigate the expression patterns of kinase genes in pecan. The expression data of PK genes in three key stages during embryo development in cultivar 'Pawnee' were retrieved with FPKM values (fragments per kilobase per million of reads mapped) from the NCBI database (BioProject number: PRJNA435846)[15].

    • One-year-old pecan seedlings, propagated from seeds (collected from 'Pawnee' trees in October) were selected as rootstock, and the genome sequenced cultivar 'Pawnee' was used as scion. Patch budding was selected and used for pecan grafting in August. After 12 months, the grafted plants were moved to a growth chamber with 14 h light at 24 °C/10 h dark at 22 °C photoperiods. The grafted plants were grown in pots under well-watered conditions for 30 d, then water was withheld for 15 d. On each grafted plant, a single compound leaf from the top was selected, and the second set of leaflets from the apex of this compound leaf were collected. Plant leaf samples were harvested at 0, 3, 6, 9, 12, and 15 d after drought treatment. The harvested samples were frozen in liquid nitrogen immediately, then stored at −70 °C to prevent RNA degradation until RNA isolation was carried out.

    • Three biological replicates of pecan leaf samples under drought treatment were harvested and applied for RNA-Seq experiments, and each biological replicate was collected from at least three grafted plants. Total RNA was isolated using Trizol reagent (Invitrogen, Carlsbad, USA) following the manufacturer's instructions. RNA quality was detected using RNase-free agarose gel electrophoresis and NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, USA). Totally, 1 μg RNA per sample was reverse transcribed to cDNA, and cDNA libraries were sequenced using an Illumina Novaseq 6000 platform (GeneDenovo, Guangzhou, China).

      Reference-guided mapping was performed with the latest genome assembly of pecan[15]. The index of the pecan reference genome was built, and clean reads were aligned to the pecan reference genome using HISAT2.2.4[52]. The mapped reads were assembled using StringTie version 1.3.1 in a reference-based method[53]. The expression abundance of pecan PK genes was quantified by calculating the FPKM value using RSEM software[54]. The raw data of RNA-Seq have been deposited and made available in NCBI with the accession number GSE179336.

      Differential expression analysis of RNA-Seq data between the control (0 d) and drought-treated datasets at 3, 6, 9, 12, and 15 d was presented by DESeq2 software[55], the PK genes with the parameter of FDR (false discovery rate) < 0.05 and the absolute value of log2 Ratio ≥ 1 were considered as differentially expressed genes.

    • Genes in the pecan kinome were classified into different clusters based on their expression patterns (p < 0.05) using the Short Time-series Expression Miner (STEM) software (http://www.cs.cmu.edu/~jernst/stem)[56].

    • To investigate the topological relationships between pecan PK subfamilies, the co-expression networks were constructed using the Pearson correlation coefficient (PCC) based on the expression profile of pecan PK genes during drought stress response using IBM SPSS software version 25 (https://www.ibm.com/products/spss-statistics). All of the gene expression data of PKs in each subfamily were averaged, and the subfamily pairs with absolute values of PCC higher than 0.9 were retrieved at the 0.01 significance level (p-value) and used for co-expression network analysis. The networks were eventually visualized using Cytoscape software version 3.7.1 (https://cytoscape.org)[57].

      • This study was supported by a grant from the Natural Science Foundation of Jiangsu Province (BK20190749), also supported by the National Key R&D Program of China (2018YFD1000604), Postdoctoral Research Funding Program of Jiangsu Province (013010428) and the Startup Foundation of Nanjing Forestry University (163010226).
      • The authors declare that they have no conflict of interest.
      • Supplemental Table S1 Kinase domain annotation of 967 pecan PKs.
      • Supplemental Table S2 Family classification of pecan PKs with related information.
      • Supplemental Table S3 Domain organization of pecan PKs.
      • Supplemental Table S4 List of pecan protein kinases containing multiple kinase domains.
      • Supplemental Table S5 GO IDs of pecan PKs.
      • Supplemental Table S6 Segmental duplication events and Ka/Ks values of pecan protein kinases.
      • Supplemental Table S7 FPKM values of pecan PK genes during embryo development.
      • Supplemental Table S8 Genes in eight groups with different expression patterns during pecan embryo developmen.
      • Supplemental Table S9 Average FPKM expression values of pecan PK genes under drought stress.
      • Supplemental Table S10 Differentially expressed PK genes in six clusters with different expression patterns under drought stress.
      • Supplemental Fig. S1 Phylogenetic classification of pecan PKs. The phylogenetic tree was generated with amino sequences of the kinase domain with maximum-likelihood method. PK subfamilies were highlighted with different colors.
      • Supplemental Fig. S2 Phylogenetic classification of PKs in four different species. The phylogenetic tree was generated with amino sequences of the kinase domain from four different species (967 from pecan, 1006 from Arabidopsis, 1168 from grape and 758 from pineapple) with maximum-likelihood method.
      • Supplemental Fig. S3 Expression of PK genes during embryo development in pecan. Log2 (FPKM+1) values were performed according to the red-white-blue color scale, and the heatmaps were generated using the R language with hierarchical clustering.
      • Supplemental Fig. S4 Different expression patterns of pecan PK genes during embryo development.
      • Supplemental Fig. S5 Expression data of PK subfamilies with drought treatment in pecan. Log2 (FPKM+1) values were performed according to the red-white-blue color scale, and the heatmaps were generated using R language with hierarchical clustering.
      • Copyright: © 2021 by the author(s). Exclusive Licensee Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (7)  References (57)
  • About this article
    Cite this article
    Zhu K, Fan P, Liu H, Zhao J, Tan P, et al. 2021. Pecan kinome: classification and expression analysis of all protein kinases in Carya illinoinensis. Forestry Research 1: 14 doi: 10.48130/FR-2021-0014
    Zhu K, Fan P, Liu H, Zhao J, Tan P, et al. 2021. Pecan kinome: classification and expression analysis of all protein kinases in Carya illinoinensis. Forestry Research 1: 14 doi: 10.48130/FR-2021-0014

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return