2022 Volume 2
Article Contents
About this article
ARTICLE   Open Access    

Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations

More Information
  • The European spruce bark beetle (Ips typographus L.) is a serious pest in Norway spruce stands. While usually attacking freshly fallen trees or trees with a reduced defense system, also healthy trees can be infested during massive outbreaks of I. typographus that can occur after catastrophic events such as drought periods or storms. Knowledge of the genetic structure of this species, especially on local scales is still ambiguous. While local population structure was reported in some studies, others did not detect any differentiation among I. typographus populations. Here, we used genotyping by sequencing to infer the genetic structure of two I. typographus populations in western Germany, which had a distance of approx. 58 km from each other. Based on 16,830 SNPs we detected high genetic diversity, but very low genetic differentiation between the populations (FST: 0.001) and a lack of population structure. These results suggest a high dispersal ability of I. typographus.
  • Aquaporins (AQPs) constitute a large family of transmembrane channel proteins that function as regulators of intracellular and intercellular water flow[1,2]. Since their first discovery in the 1990s, AQPs have been found not only in three domains of life, i.e., bacteria, eukaryotes, and archaea, but also in viruses[3,4]. Each AQP monomer is composed of an internal repeat of three transmembrane helices (i.e., TM1–TM6) as well as two half helixes that are formed by loop B (LB) and LE through dipping into the membrane[5]. The dual Asn-Pro-Ala (NPA) motifs that are located at the N-terminus of two half helixes act as a size barrier of the pore via creating an electrostatic repulsion of protons, whereas the so-called aromatic/arginine (ar/R) selectivity filter (i.e., H2, H5, LE1, and LE2) determines the substrate specificity by rendering the pore constriction site diverse in both size and hydrophobicity[59]. Based on sequence similarity, AQPs in higher plants could be divided into five subfamilies, i.e., plasma membrane intrinsic protein (PIP), tonoplast intrinsic protein (TIP), NOD26-like intrinsic protein (NIP), X intrinsic protein (XIP), and small basic intrinsic protein (SIP)[1017]. Among them, PIPs, which are typically localized in the cell membrane, are most conserved and play a central role in controlling plant water status[12,1822]. Among two phylogenetic groups present in the PIP subfamily, PIP1 possesses a relatively longer N-terminus and PIP2 features an extended C-terminus with one or more conserved S residues for phosphorylation modification[5,15,17].

    Tigernut (Cyperus esculentus L.), which belongs to the Cyperaceae family within Poales, is a novel and promising herbaceous C4 oil crop with wide adaptability, large biomass, and short life period[2327]. Tigernut is a unique species accumulating up to 35% oil in the underground tubers[2830], which are developed from stolons and the process includes three main stages, i.e., initiation, swelling, and maturation[3133]. Water is essential for tuber development and tuber moisture content maintains a relatively high level of approximately 85% until maturation when a significant drop to about 45% is observed[28,32]. Thereby, uncovering the mechanism of tuber water balance is of particular interest. Despite crucial roles of PIPs in the cell water balance, to date, their characterization in tigernut is still in the infancy[21]. The recently available genome and transcriptome datasets[31,33,34] provide an opportunity to address this issue.

    In this study, a global characterization of PIP genes was conducted in tigernut, including gene localizations, gene structures, sequence characteristics, and evolutionary patterns. Moreover, the correlation of CePIP mRNA/protein abundance with water content during tuber development as well as subcellular localizations were also investigated, which facilitated further elucidating the water balance mechanism in this special species.

    PIP genes reported in Arabidopsis (Arabidopsis thaliana)[10] and rice (Oryza sativa)[11] were respectively obtained from TAIR11 (www.arabidopsis.org) and RGAP7 (http://rice.uga.edu), and detailed information is shown in Supplemental Table S1. Their protein sequences were used as queries for tBLASTn[35] (E-value, 1e–10) search of the full-length tigernut transcriptome and genome sequences that were accessed from CNGBdb (https://db.cngb.org/search/assembly/CNA0051961)[31,34]. RNA sequencing (RNA-seq) reads that are available in NCBI (www.ncbi.nlm.nih.gov/sra) were also adopted for gene structure revision as described before[13], and presence of the conserved MIP (major intrinsic protein, Pfam accession number PF00230) domain in candidates was confirmed using MOTIF Search (www.genome.jp/tools/motif). To uncover the origin and evolution of CePIP genes, a similar approach was also employed to identify homologs from representative plant species, i.e., Carex cristatella (v1, Cyperaceae)[36], Rhynchospora breviuscula (v1, Cyperaceae)[37], and Juncus effusus (v1, Juncaceae)[37], whose genome sequences were accessed from NCBI (www.ncbi.nlm.nih.gov). Gene structure of candidates were displayed using GSDS 2.0 (http://gsds.gao-lab.org), whereas physiochemical parameters of deduced proteins were calculated using ProtParam (http://web.expasy.org/protparam). Subcellular localization prediction was conducted using WoLF PSORT (www.genscript.com/wolf-psort.html).

    Nucleotide and protein multiple sequence alignments were respectively conducted using ClustalW and MUSCLE implemented in MEGA6[38] with default parameters, and phylogenetic tree construction was carried out using MEGA6 with the maximum likelihood method and bootstrap of 1,000 replicates. Systematic names of PIP genes were assigned with two italic letters denoting the source organism and a progressive number based on sequence similarity. Conserved motifs were identified using MEME Suite 5.5.3 (https://meme-suite.org/tools/meme) with optimized parameters as follows: Any number of repetitions, maximum number of 15 motifs, and a width of 6 and 250 residues for each motif. TMs and conserved residues were identified using homology modeling and sequence alignment with the structure resolved spinach (Spinacia oleracea) SoPIP2;1[5].

    Synteny analysis was conducted using TBtools-II[39] as described previously[40], where the parameters were set as E-value of 1e-10 and BLAST hits of 5. Duplication modes were identified using the DupGen_finder pipeline[41], and Ks (synonymous substitution rate) and Ka (nonsynonymous substitution rate) of duplicate pairs were calculated using codeml in the PAML package[42]. Orthologs between different species were identified using InParanoid[43] and information from synteny analysis, and orthogroups (OGs) were assigned only when they were present in at least two species examined.

    Plant materials used for gene cloning, qRT-PCR analysis, and 4D-parallel reaction monitoring (4D-PRM)-based protein quantification were derived from a tigernut variety Reyan3[31], and plants were grown in a greenhouse as described previously[25]. For expression profiling during leaf development, three representative stages, i.e., young, mature, and senescing, were selected and the chlorophyll content was checked using SPAD-502Plus (Konica Minolta, Shanghai, China) as previously described[44]. Young and senescing leaves are yellow in appearance, and their chlorophyll contents are just half of that of mature leaves that are dark green. For diurnal fluctuation regulation, mature leaves were sampled every 4 h from the onset of light at 8 a.m. For gene regulation during tuber development, fresh tubers at 1, 5, 10, 15, 20, 25, and 35 d after tuber initiation (DAI) were collected as described previously[32]. All samples with three biological replicates were quickly frozen with liquid nitrogen and stored at −80 °C for further use. For subcellular localization analysis, tobacco (Nicotiana benthamiana) plants were grown as previously described[20].

    Tissue-specific expression profiles of CePIP genes were investigated using Illumina RNA-seq samples (150 bp paired-end reads) with three biological replicates for young leaf, mature leaf, sheath of mature leaf, shoot apex, root, rhizome, and three stages of developmental tuber (40, 85, and 120 d after sowing (DAS)), which are under the NCBI accession number of PRJNA703731. Raw sequence reads in the FASTQ format were obtained using fastq-dump, and quality control was performed using fastQC (www.bioinformatics.babraham.ac.uk/projects/fastqc). Read mapping was performed using HISAT2 (v2.2.1, https://daehwankimlab.github.io/hisat2), and relative gene expression level was presented as FPKM (fragments per kilobase of exon per million fragments mapped)[45].

    For qRT-PCR analysis, total RNA extraction and synthesis of the first-strand cDNA were conducted as previously described[24]. Primers used in this study are shown in Supplemental Table S2, where CeUCE2 and CeTIP41[25,33] were employed as two reference genes. PCR reaction in triplicate for each biological sample was carried out using the SYBR-green Mix (Takara) on a Real-time Thermal Cycler Type 5100 (Thermal Fisher Scientific Oy). Relative gene abundance was estimated with the 2−ΔΔCᴛ method and statistical analysis was performed using SPSS Statistics 20 as described previously[13].

    Raw proteomic data for tigernut roots, leaves, freshly harvested, dried, rehydrated for 48 h, and sprouted tubers were downloaded from ProteomeXchange/PRIDE (www.proteomexchange.org, PXD021894, PXD031123, and PXD035931), which were further analyzed using Maxquant (v1.6.15.0, www.maxquant.org). Three dominant members, i.e., CePIP1;1, -2;1, and -2;8, were selected for 4D-PRM quantification analysis, and related unique peptides are shown in Supplemental Table S3. Protein extraction, trypsin digestion, and LC-MS/MS analysis were conducted as described previously[46].

    For subcellular localization analysis, the coding region (CDS) of CePIP1;1, -2;1, and -2;8 were cloned into pNC-Cam1304-SubN via Nimble Cloning as described before[30]. Then, recombinant plasmids were introduced into Agrobacterium tumefaciens GV3101 with the helper plasmid pSoup-P19 and infiltration of 4-week-old tobacco leaves were performed as previously described[20]. For subcellular localization analysis, the plasma membrane marker HbPIP2;3-RFP[22] was co-transformed as a positive control. Fluorescence observation was conducted using confocal laser scanning microscopy imaging (Zeiss LMS880, Germany): The wavelength of laser-1 was set as 730 nm for RFP observation, where the fluorescence was excited at 561 nm; the wavelength of laser-2 was set as 750 nm for EGFP observation, where the fluorescence was excited at 488 nm; and the wavelength of laser-3 was set as 470 nm for chlorophyll autofluorescence observation, where the fluorescence was excited at 633 nm.

    As shown in Table 1, a total of 14 PIP genes were identified from eight tigernut scaffolds (Scfs). The CDS length varies from 831 to 882 bp, putatively encoding 276–293 amino acids (AA) with a molecular weight (MW) of 29.16–31.59 kilodalton (kDa). The theoretical isoelectric point (pI) varies from 7.04 to 9.46, implying that they are all alkaline. The grand average of hydropathicity (GRAVY) is between 0.344 and 0.577, and the aliphatic index (II) ranges from 94.57 to 106.90, which are consistent with the hydrophobic characteristic of AQPs[47]. As expected, like SoPIP2;1, all CePIPs include six TMs, two typical NPA motifs, the invariable ar/R filter F-H-T-R, five conserved Froger's positions Q/M-S-A-F-W, and two highly conserved residues corresponding to H193 and L197 in SoPIP2;1 that were proven to be involved in gating[5,48], though the H→F variation was found in CePIP2;9, -2;10, and -2;11 (Supplemental Fig. S1). Moreover, two S residues, corresponding to S115 and S274 in SoPIP2;1[5], respectively, were also found in the majority of CePIPs (Supplemental Fig. S1), implying their posttranslational regulation by phosphorylation.

    Table 1.  Fourteen PIP genes identified in C. esculentus.
    Gene name Locus Position Intron no. AA MW (kDa) pI GRAVY AI TM MIP
    CePIP1;1 CESC_15147 Scf9:2757378..2759502(–) 3 288 30.76 8.82 0.384 95.28 6 47..276
    CePIP1;2 CESC_04128 Scf4:3806361..3807726(–) 3 291 31.11 8.81 0.344 95.95 6 46..274
    CePIP1;3 CESC_15950 Scf54:5022493..5023820(+) 3 289 31.06 8.80 0.363 94.57 6 49..278
    CePIP2;1 CESC_15350 Scf9:879960..884243(+) 3 288 30.34 8.60 0.529 103.02 6 33..269
    CePIP2;2 CESC_00011 Scf30:4234620..4236549(+) 3 293 31.59 9.27 0.394 101.57 6 35..268
    CePIP2;3 CESC_00010 Scf30:4239406..4241658(+) 3 291 30.88 9.44 0.432 98.97 6 31..266
    CePIP2;4 CESC_05080 Scf46:307799..309544(+) 3 285 30.44 7.04 0.453 100.32 6 28..265
    CePIP2;5 CESC_05079 Scf46:312254..314388(+) 3 286 30.49 7.04 0.512 101.68 6 31..268
    CePIP2;6 CESC_05078 Scf46:316024..317780(+) 3 288 30.65 7.68 0.475 103.06 6 31..268
    CePIP2;7 CESC_05077 Scf46:320439..322184(+) 3 284 30.12 8.55 0.500 100.00 6 29..266
    CePIP2;8 CESC_14470 Scf2:4446409..4448999(+) 3 284 30.37 8.30 0.490 106.90 6 33..263
    CePIP2;9 CESC_02223 Scf1:2543928..2545778(–) 3 283 30.09 9.46 0.533 106.47 6 31..262
    CePIP2;10 CESC_10007 Scf27:1686032..1688010(–) 3 276 29.16 9.23 0.560 106.05 6 26..256
    CePIP2;11 CESC_10009 Scf27:1694196..1696175(–) 3 284 29.71 9.10 0.577 105.49 6 33..263
    AA: amino acid; AI: aliphatic index; GRAVY: grand average of hydropathicity; kDa: kilodalton; MIP: major intrinsic protein; MW: molecular weight; pI: isoelectric point; PIP: plasma membrane intrinsic protein; Scf: scaffold; TM: transmembrane helix.
     | Show Table
    DownLoad: CSV

    To uncover the evolutionary relationships, an unrooted phylogenetic tree was constructed using the full-length protein sequences of CePIPs together with 11 OsPIPs and 13 AtPIPs. As shown in Fig. 1a, these proteins were clustered into two main groups, corresponding to PIP1 and PIP2 as previously defined[10,49], and each appears to have evolved into several subgroups. Compared with PIP1s, PIP2s possess a relatively shorter N-terminal but an extended C-terminal with one conserved S residue (Supplemental Fig. S1). Interestingly, a high number of gene repeats were detected, most of which seem to be species-specific, i.e., AtPIP1;1/-1;2/-1;3/-1;4/-1;5, AtPIP2;1/-2;2/-2;3/-2;4/-2;5/-2;6, AtPIP2;7/-2;8, OsPIP1;1/-1;2/-1;3, OsPIP2;1/-2;4/-2;5, OsPIP2;2/-2;3, CePIP1;1/-1;2, CePIP2;2/-2;3, CePIP2;4/-2;5/-2;6/-2;7, and CePIP2;9/-2;10/-2;11, reflecting the occurrence of more than one lineage-specific whole-genome duplications (WGDs) after their divergence[50,51]. In Arabidopsis that experienced three WGDs (i.e. γ, β, and α) after the split with the monocot clade[52], AtPIP1;5 in the PIP1 group first gave rise to AtPIP1;1 via the γ WGD shared by all core eudicots[50], which latter resulted in AtPIP1;3, -1;4, and -1;2 via β and α WGDs; AtPIP2;1 in the PIP2 group first gave rise to AtPIP2;6 via the γ WGD, and they latter generated AtPIP2;2, and -2;5 via the α WGD (Supplemental Table S1). In rice, which also experienced three WGDs (i.e. τ, σ, and ρ) after the split with the eudicot clade[51], OsPIP1;2 and -2;3 generated OsPIP1;1 and -2;2 via the Poaceae-specific ρ WGD, respectively. Additionally, tandem, proximal, transposed and dispersed duplications also played a role on the gene expansion in these two species (Supplemental Table S1).

    Figure 1.  Structural and phylogenetic analysis of PIPs in C. esculentus, O. sativa, and A. thaliana. (a) Shown is an unrooted phylogenetic tree resulting from full-length PIPs with MEGA6 (maximum likelihood method and bootstrap of 1,000 replicates), where the distance scale denotes the number of amino acid substitutions per site. (b) Shown are the exon-intron structures. (c) Shown is the distribution of conserved motifs among PIPs, where different motifs are represented by different color blocks as indicated and the same color block in different proteins indicates a certain motif. (At: A. thaliana; Ce: C. esculentus; PIP: plasma membrane intrinsic protein; Os: O. sativa).

    Analysis of gene structures revealed that all CePIP and AtPIP genes possess three introns and four exons in the CDS, in contrast to the frequent loss of certain introns in rice, including OsPIP1;2, -1;3, -2;1, -2;3, -2;4, -2;5, -2;6, -2;7, and -2;8 (Fig. 1b). The positions of three introns are highly conserved, which are located in sequences encoding LB (three residues before the first NPA), LD (one residue before the conserved L involved in gating), and LE (18 residues after the second NPA), respectively (Supplemental Fig. S1). The intron length of CePIP genes is highly variable, i.e., 109–993 bp, 115–1745 bp, and 95–866 bp for three introns, respectively. By contrast, the exon length is relatively less variable: Exons 2 and 3 are invariable with 296 bp and 141 bp, respectively, whereas Exons 1 and 4 are of 277–343 bp and 93–132 bp, determining the length of N- and C-terminus of PIP1 and PIP2, respectively (Fig. 1b). Correspondingly, their protein structures were shown to be highly conserved, and six (i.e., Motifs 1–6) out of 15 motifs identified are broadly present. Among them, Motif 3, -2, -6, -1, and -4 constitute the conserved MIP domain. In contrast to a single Motif 5 present in most PIP2s, all PIP1s possess two sequential copies of Motif 5, where the first one is located at the extended N-terminal. In CePIP2;3 and OsPIP2;7, Motif 5 is replaced by Motif 13; in CePIP2;2, it is replaced by two copies of Motif 15; and no significant motif was detected in this region of CePIP2;10. PIP1s and PIP2s usually feature Motif 9 and -7 at the C-terminal, respectively, though it is replaced by Motif 12 in CePIP2;6 and OsPIP2;8. PIP2s usually feature Motif 8 at the N-terminal, though it is replaced by Motif 14 in CePIP2;2 and -2;3 or replaced by Motif 11 in CePIP2;10 and -2;11 (Fig. 1c).

    As shown in Fig. 2a, gene localization of CePIPs revealed three gene clusters, i.e., CePIP2;2/-2;3 on Scf30, CePIP2;4/-2;5/-2;6/-2;7 on Scf46, and CePIP2;10/-2;11 on Scf27, which were defined as tandem repeats for their high sequence similarities and neighboring locations. The nucleotide identities of these duplicate pairs vary from 70.5% to 91.2%, and the Ks values range from 0.0971 to 1.2778 (Table 2), implying different time of their birth. According to intra-species synteny analysis, two duplicate pairs, i.e., CePIP1;1/-1;2 and CePIP2;2/-2;4, were shown to be located within syntenic blocks (Fig. 2b) and thus were defined as WGD repeats. Among them, CePIP1;1/-1;2 possess a comparable Ks value to CePIP2;2/-2;3, CePIP1;1/-1;3, and CePIP2;4/-2;8 (1.2522 vs 1.2287–1.2778), whereas CePIP2;2/-2;4 harbor a relatively higher Ks value of 1.5474 (Table 2), implying early origin or fast evolution of the latter. While CePIP1;1/-1;3 and CePIP2;1/-2;8 were characterized as transposed repeats, CePIP2;1/-2;2, CePIP2;9/-2;10, and CePIP2;8/-2;10 were characterized as dispersed repeats (Fig. 2a). The Ks values of three dispersed repeats vary from 0.8591 to 3.0117 (Table 2), implying distinct times of origin.

    Figure 2.  Duplication events of CePIP genes and synteny analysis within and between C. esculentus, O. sativa, and A. thaliana. (a) Duplication events detected in tigernut. Serial numbers are indicated at the top of each scaffold, and the scale is in Mb. Duplicate pairs identified in this study are connected using lines in different colors, i.e., tandem (shown in green), transposed (shown in purple), dispersed (shown in gold), and WGD (shown in red). (b) Synteny analysis within and between C. esculentus, O. sativa, and A. thaliana. (c) Synteny analysis within and between C. esculentus, C. cristatella, R. breviuscula, and J. effusus. Shown are PIP-encoding chromosomes/scaffolds and only syntenic blocks that contain PIP genes are marked, i.e., red and purple for intra- and inter-species, respectively. (At: A. thaliana; Cc: C. cristatella; Ce: C. esculentus; Je: J. effusus; Mb: megabase; PIP: plasma membrane intrinsic protein; Os: O. sativa; Rb: R. breviuscula; Scf: scaffold; WGD: whole-genome duplication).
    Table 2.  Sequence identity and evolutionary rate of homologous PIP gene pairs identified in C. esculentus. Ks and Ka were calculated using PAML.
    Duplicate 1 Duplicate 2 Identity (%) Ka Ks Ka/Ks
    CePIP1;1 CePIP1;3 78.70 0.0750 1.2287 0.0610
    CePIP1;2 CePIP1;1 77.20 0.0894 1.2522 0.0714
    CePIP2;1 CePIP2;4 74.90 0.0965 1.7009 0.0567
    CePIP2;3 CePIP2;2 70.50 0.1819 1.2778 0.1424
    CePIP2;4 CePIP2;2 66.50 0.2094 1.5474 0.1353
    CePIP2;5 CePIP2;4 87.30 0.0225 0.4948 0.0455
    CePIP2;6 CePIP2;5 84.90 0.0545 0.5820 0.0937
    CePIP2;7 CePIP2;6 78.70 0.0894 1.0269 0.0871
    CePIP2;8 CePIP2;4 72.90 0.1401 1.2641 0.1109
    CePIP2;9 CePIP2;10 76.40 0.1290 0.8591 0.1502
    CePIP2;10 CePIP2;8 64.90 0.2432 3.0117 0.0807
    CePIP2;11 CePIP2;10 91.20 0.0562 0.0971 0.5783
    Ce: C. esculentus; Ka: nonsynonymous substitution rate; Ks: synonymous substitution rate; PIP: plasma membrane intrinsic protein.
     | Show Table
    DownLoad: CSV

    According to inter-species syntenic analysis, six out of 14 CePIP genes were shown to have syntelogs in rice, including 1:1, 1:2, and 2:2 (i.e. CePIP1;1 vs OsPIP1;3, CePIP1;3 vs OsPIP1;2/-1;1, CePIP2;1 vs OsPIP2;4, CePIP2;2/-2;4 vs OsPIP2;3/-2;2, and CePIP2;8 vs OsPIP2;6), in striking contrast to a single one found in Arabidopsis (i.e. CePIP1;2 vs AtPIP1;2). Correspondingly, only OsPIP1;2 in rice was shown to have syntelogs in Arabidopsis, i.e., AtPIP1;3 and -1;4 (Fig. 2b). These results are consistent with their taxonomic relationships that tigernut and rice are closely related[50,51], and also imply lineage-specific evolution after their divergence.

    As described above, phylogenetic and syntenic analyses showed that the last common ancestor of tigernut and rice is more likely to possess only two PIP1s and three PIP2s. However, it is not clear whether the gene expansion observed in tigernut is species-specific or Cyperaceae-specific. To address this issue, recently available genomes were used to identify PIP subfamily genes from C. cristatella, R. breviuscula, and J. effuses, resulting in 15, 13, and nine members, respectively. Interestingly, in contrast to a high number of tandem repeats found in Cyperaceae species, only one pair of tandem repeats (i.e., JePIP2;3 and -2;4) were identified in J. effusus, a close outgroup species to Cyperaceae in the Juncaceae family[36,37]. According to homologous analysis, a total of 12 orthogroups were identified, where JePIP genes belong to PIP1A (JePIP1;1), PIP1B (JePIP1;2), PIP1C (JePIP1;3), PIP2A (JePIP2;1), PIP2B (JePIP2;2), PIP2F (JePIP2;3 and -2;4), PIP2G (JePIP2;5), and PIP2H (JePIP2;6) (Table 3). Further intra-species syntenic analysis revealed that JePIP1;1/-1;2 and JePIP2;2/-2;3 are located within syntenic blocks, which is consistent with CePIP1;1/-1;2, CePIP2;2/-2;4, CcPIP1;1/-1;2, CcPIP2;3/-2;4, RbPIP1;1/-1;2, and RbPIP2;2/-2;5 (Fig. 2c), implying that PIP1A/PIP1B and PIP2B/PIP2D were derived from WGDs occurred sometime before Cyperaceae-Juncaceae divergence. After the split with Juncaceae, tandem duplications frequently occurred in Cyperaceae, where PIP2B/PIP2C and PIP2D/PIP2E/PIP2F retain in most Cyperaceae plants examined in this study. By contrast, species-specific expansion was also observed, i.e., CePIP2;4/-2;5, CePIP2;10/-2;11, CcPIP1;2/-1;3, CcPIP2;4/-2;5, CcPIP2;8/-2;9, CcPIP2;10/-2;11, RbPIP2;3/-2;4, and RbPIP2;9/-2;10 (Table 3 & Fig. 2c).

    Table 3.  Twelve proposed orthogroups based on comparison of representative plant species.
    Orthogroup C. esculentus C. cristatella R. breviuscula J. effusus O. sativa A. thaliana
    PIP1A CePIP1;1 CcPIP1;1 RbPIP1;1 JePIP1;1 OsPIP1;3 AtPIP1;1, AtPIP1;2,
    AtPIP1;3, AtPIP1;4,
    AtPIP1;5
    PIP1B CePIP1;2 CcPIP1;2, CcPIP1;3 RbPIP1;2 JePIP1;2
    PIP1C CePIP1;3 CcPIP1;4 RbPIP1;3 JePIP1;3 OsPIP1;1, OsPIP1;2
    PIP2A CePIP2;1 CcPIP2;1 RbPIP2;1 JePIP2;1 OsPIP2;1, OsPIP2;4,
    OsPIP2;5
    AtPIP2;1, AtPIP2;2,
    AtPIP2;3, AtPIP2;4,
    AtPIP2;5, AtPIP2;6
    PIP2B CePIP2;2 CcPIP2;2 RbPIP2;2 JePIP2;2 OsPIP2;2, OsPIP2;3
    PIP2C CePIP2;3 CcPIP2;3 RbPIP2;3, RbPIP2;4
    PIP2D CePIP2;4, CePIP2;5 CcPIP2;4, CcPIP2;5 RbPIP2;5
    PIP2E CePIP2;5 CcPIP2;5 RbPIP2;6
    PIP2F CePIP2;6 CcPIP2;6
    PIP2G CePIP2;7 CcPIP2;7 RbPIP2;7 JePIP2;3, JePIP2;4
    PIP2H CePIP2;8 CcPIP2;8, CcPIP2;9 RbPIP2;8 JePIP2;5 OsPIP2;6 AtPIP2;7, AtPIP2;8
    PIP2I CePIP2;9, CePIP2;10,
    CePIP2;11
    CcPIP2;10, CcPIP2;11 RbPIP2;9, RbPIP2;10 JePIP2;6 OsPIP2;7, OsPIP2;8
    At: A. thaliana; Cc: C. cristatella; Ce: C. esculentus; Je: J. effuses; Os: O. sativa; Rb: R. breviuscula; PIP: plasma membrane intrinsic protein.
     | Show Table
    DownLoad: CSV

    Tissue-specific expression profiles of CePIP genes were investigated using transcriptome data available for young leaf, mature leaf, sheath, root, rhizome, shoot apex, and tuber. As shown in Fig. 3a, CePIP genes were mostly expressed in roots, followed by sheaths, moderately in tubers, young leaves, rhizomes, and mature leaves, and lowly in shoot apexes. In most tissues, CePIP1;1, -2;1, and -2;8 represent three dominant members that contributed more than 90% of total transcripts. By contrast, in rhizome, these three members occupied about 80% of total transcripts, which together with CePIP1;3 and -2;4 contributed up to 96%; in root, CePIP1;1, -1;3, -2;4, and -2;7 occupied about 84% of total transcripts, which together with CePIP2;1 and -2;8 contributed up to 94%. According to their expression patterns, CePIP genes could be divided into five main clusters: Cluster I includes CePIP1;1, -2;1, and -2;8 that were constitutively and highly expressed in all tissues examined; Cluster II includes CePIP2;2, -2;9, and -2;10 that were lowly expressed in all tested tissues; Cluster III includes CePIP1;2 and -2;11 that were preferentially expressed in young leaf and sheath; Cluster IV includes CePIP1;3 and -2;4 that were predominantly expressed in root and rhizome; and Cluster V includes remains that were typically expressed in root (Fig. 3a). Collectively, these results imply expression divergence of most duplicate pairs and three members (i.e. CePIP1;1, -2;1, and -2;8) have evolved to be constitutively co-expressed in most tissues.

    Figure 3.  Expression profiles of CePIP genes in various tissues, different stages of leaf development, and mature leaves of diurnal fluctuation. (a) Tissue-specific expression profiles of 14 CePIP genes. The heatmap was generated using the R package implemented with a row-based standardization. Color scale represents FPKM normalized log2 transformed counts, where blue indicates low expression and red indicates high expression. (b) Expression profiles of CePIP1;1, -2;1, and -2;8 at different stages of leaf development. (c) Expression profiles of CePIP1;1, -2;1, and -2;8 in mature leaves of diurnal fluctuation. Bars indicate SD (N = 3) and uppercase letters indicate difference significance tested following Duncan's one-way multiple-range post hoc ANOVA (p< 0.01). (Ce: C. esculentus; FPKM: Fragments per kilobase of exon per million fragments mapped; PIP: plasma membrane intrinsic protein)

    As shown in Fig. 3a, compared with young leaves, transcriptome profiling showed that CePIP1;2, -2;3, -2;7, -2;8, and -2;11 were significantly down-regulated in mature leaves, whereas CePIP1;3 and -2;1 were up-regulated. To confirm the results, three dominant members, i.e., CePIP1;1, -2;1, and -2;8, were selected for qRT-PCR analysis, which includes three representative stages, i.e., young, mature, and senescing leaves. As shown in Fig. 3b, in contrast to CePIP2;1 that exhibited a bell-like expression pattern peaking in mature leaves, transcripts of both CePIP1;1 and -2;8 gradually decreased during leaf development. These results were largely consistent with transcriptome profiling, and the only difference is that CePIP1;1 was significantly down-regulated in mature leaves relative to young leaves. However, this may be due to different experiment conditions used, i.e., greenhouse vs natural conditions.

    Diurnal fluctuation expression patterns of CePIP1;1, -2;1, and -2;8 were also investigated in mature leaves and results are shown in Fig. 3c. Generally, transcripts of all three genes in the day (8, 12, 16, and 20 h) were higher than that in the night (24 and 4 h). During the day, both CePIP1;1 and -2;8 exhibited an unimodal expression pattern that peaked at 12 h, whereas CePIP2;1 possessed two peaks (8 and 16 h) and their difference was not significant. Nevertheless, transcripts of all three genes at 20 h (onset of night) were significantly lower than those at 8 h (onset of day) as well as 12 h. In the night, except for CePIP2;1, no significant difference was observed between the two stages for both CePIP1;1 and -2;8. Moreover, their transcripts were comparable to those at 20 h (Fig. 3c).

    To reveal the expression patterns of CePIP genes during tuber development, three representative stages, i.e., 40 DAS (early swelling stage), 85 DAS (late swelling stage), and 120 DAS (mature stage), were first profiled using transcriptome data. As shown in Fig. 4a, except for rare expression of CePIP1;2, -2;2, -2;9, and -2;10, most genes exhibited a bell-like expression pattern peaking at 85 DAS, in contrast to a gradual decrease of CePIP2;3 and -2;8. Notably, except for CePIP2;4, other genes were expressed considerably lower at 120 DAS than that at 40 DAS. For qRT-PCR confirmation of CePIP1;1, -2;1, and -2;8, seven stages were examined, i.e., 1, 5, 10, 15, 20, 25, and 35 DAI, which represent initiation, five stages of swelling, and maturation as described before[32]. As shown in Fig. 4b, two peaks were observed for all three genes, though their patterns were different. As for CePIP1;1, compared with the initiation stage (1 DAI), significant up-regulation was observed at the early swelling stage (5 DAI), followed by a gradual decrease except for the appearance of the second peak at 20 DAI, which is something different from transcriptome profiling. As for CePIP2;1, a sudden drop of transcripts first appeared at 5 DAI, then gradually increased until 20 DAI, which was followed by a gradual decrease at two late stages. The pattern of CePIP2;8 is similar to -1;1, two peaks appeared at 5 and 20 DAI and the second peak was significantly lower than the first. The difference is that the second peak of CePIP2;8 was significantly lower than the initiation stage. By contrast, the second peak (20 DAI) of CePIP2;1 was significantly higher than that of the first one (1 DAI). Nevertheless, the expression patterns of both CePIP2;1 and -2;8 are highly consistent with transcriptome profiling.

    Figure 4.  Transcript and protein abundances of CePIP genes during tuber development. (a) Transcriptome-based expression profiling of 14 CePIP genes during tuber development. The heatmap was generated using the R package implemented with a row-based standardization. Color scale represents FPKM normalized log2 transformed counts, where blue indicates low expression and red indicates high expression. (b) qRT-PCR-based expression profiling of CePIP1;1, -2;1, and -2;8 in seven representative stages of tuber development. (c) Relative protein abundance of CePIP1;1, -2;1, and -2;8 in three representative stages of tuber development. Bars indicate SD (N = 3) and uppercase letters indicate difference significance tested following Duncan's one-way multiple-range post hoc ANOVA (p < 0.01). (Ce: C. esculentus; DAI: days after tuber initiation; DAS: days after sowing; FPKM: Fragments per kilobase of exon per million fragments mapped; PIP: plasma membrane intrinsic protein).

    Since protein abundance is not always in agreement with the transcript level, protein profiles of three dominant members (i.e. CePIP1;1, -2;1, and -2;8) during tuber development were further investigated. For this purpose, we first took advantage of available proteomic data to identify CePIP proteins, i.e., leaves, roots, and four stages of tubers (freshly harvested, dried, rehydrated for 48 h, and sprouted). As shown in Supplemental Fig. S2, all three proteins were identified in both leaves and roots, whereas CePIP1;1 and -2;8 were also identified in at least one of four tested stages of tubers. Notably, all three proteins were considerably more abundant in roots, implying their key roles in root water balance.

    To further uncover their profiles during tuber development, 4D-PRM-based protein quantification was conducted in three representative stages of tuber development, i.e., 1, 25, and 35 DAI. As expected, all three proteins were identified and quantified. In contrast to gradual decrease of CePIP2;8, both CePIP1;1 and -2;1 exhibited a bell-like pattern that peaked at 25 DAI, though no significant difference was observed between 1 and 25 DAI (Fig. 4c). The trends are largely in accordance with their transcription patterns, though the reverse trend was observed for CePIP2;1 at two early stages (Fig. 4b & Fig. 4c).

    As predicted by WoLF PSORT, CePIP1;1, -2;1, and -2;8 may function in the cell membrane. To confirm the result, subcellular localization vectors named pNC-Cam1304-CePIP1;1, pNC-Cam1304-CePIP2;1, and pNC-Cam1304-CePIP2;8 were further constructed. When transiently overexpressed in tobacco leaves, green fluorescence signals of all three constructs were confined to cell membranes, highly coinciding with red fluorescence signals of the plasma membrane marker HbPIP2;3-RFP (Fig. 5).

    Figure 5.  (a) Schematic diagram of overexpressing constructs, (b) subcellular localization analysis of CePIP1;1, -2;1, and -2;8 in N. benthamiana leaves. (35S: cauliflower mosaic virus 35S RNA promoter; Ce: C. esculentus; EGFP: enhanced green fluorescent protein; kb: kilobase; NOS: terminator of the nopaline synthase gene; RFP: red fluorescent protein; PIP: plasma membrane intrinsic protein).

    Water balance is particularly important for cell metabolism and enlargement, plant growth and development, and stress responses[2,19]. As the name suggests, AQPs raised considerable interest for their high permeability to water, and plasma membrane-localized PIPs were proven to play key roles in transmembrane water transport between cells[1,18]. The first PIP was discovered in human erythrocytes, which was named CHIP28 or AQP1, and its homolog in plants was first characterized in Arabidopsis, which is known as RD28, PIP2c, or AtPIP2;3[3,7,53]. Thus far, genome-wide identification of PIP genes have been reported in a high number of plant species, including two model plants Arabidopsis and rice[10,11,1317,5456]. By contrast, little information is available on Cyperaceae, the third largest family within the monocot clade that possesses more than 5,600 species[57].

    Given the crucial roles of water balance for tuber development and crop production, in this study, tigernut, a representative Cyperaceae plant producing high amounts of oil in underground tubers[28,30,32], was employed to study PIP genes. A number of 14 PIP genes representing two phylogenetic groups (i.e., PIP1 and PIP2) or 12 orthogroups (i.e., PIP1A, PIP1B, PIP1C, PIP2A, PIP2B, PIP2C, PIP2D, PIP2E, PIP2F, PIP2G, PIP2H, and PIP2I) were identified from the tigernut genome. Though the family amounts are comparative or less than 13–21 present in Arabidopsis, cassava (Manihot esculenta), rubber tree (Hevea brasiliensis), poplar (Populus trichocarpa), C. cristatella, R. breviuscula, banana (Musa acuminata), maize (Zea mays), sorghum (Sorghum bicolor), barley (Hordeum vulgare), and switchgrass (Panicum virgatum), they are relatively more than four to 12 found in eelgrass (Zostera marina), Brachypodium distachyon, foxtail millet (Setaria italic), J. effuses, Aquilegia coerulea, papaya (Carica papaya), castor been (Ricinus communis), and physic nut (Jatropha curcas) (Supplemental Table S4). Among them, A. coerulea represents a basal eudicot that didn't experience the γ WGD shared by all core eudicots[50], whereas eelgrass is an early diverged aquatic monocot that didn't experience the τ WGD shared by all core monocots[56]. Interestingly, though both species possess two PIP1s and two PIP2s, they were shown to exhibit complex orthologous relationships of 1:1, 2:2, 1:0, and 0:1 (Supplemental Table S5). Whereas AcPIP1;1/AcPIP1;2/ZmPIP1;1/ZmPIP1;2 and ZmPIP2;1/AcPIP2;1 belong to PIP1A and PIP2A identified in this study, AcPIP2;2 and ZmPIP2;2 belong to PIP2H and PIP2I, respectively (Supplemental Table S5), implying that the last common ancestor of monocots and eudicots possesses only one PIP1 and two PIP2s followed by clade-specific expansion. A good example is the generation of AtPIP1;1 and -2;6 from AtPIP1;5 and -2;1 via the γ WGD, respectively[17].

    In tigernut, extensive expansion of the PIP subfamily was contributed by WGD (2), transposed (2), tandem (5), and dispersed duplications (3). It's worth noting that, two transposed repeats (i.e., CePIP1;1/-1;3 and CePIP2;1/-2;8) are shared by rice, implying their early origin that may be generated sometime after the split with the eudicot clade but before Cyperaceae-Poaceae divergence. By contrast, two WGD repeats (i.e., CePIP1;1/-1;2 and CePIP2;2/-2;4) are shared by C. cristatella, R. breviuscula, and J. effusus but not rice and Arabidopsis, implying that they may be derived from WGDs that occurred sometime after Cyperaceae-Poaceae split but before Cyperaceae-Juncaceae divergence. The possible WGD is the one that was described in C. littledalei[58], though the exact time still needs to be studied. Interestingly, compared with Arabidopsis (1) and rice (2), tandem/proximal duplications played a more important role in the expansion of PIP genes in tigernut (5) as well as other Cyperaceae species tested (5–6), which were shown to be Cyperaceae-specific or even species-specific. These tandem repeats may play a role in the adaptive evolution of Cyperaceae species as described in a high number of plant species[14,41]. According to comparative genomics analyses, tandem duplicates experienced stronger selective pressure than genes formed by other modes (WGD, transposed duplication, and dispersed duplication) and evolved toward biased functional roles involved in plant self-defense[41].

    As observed in most species such as Arabidopsis[10,1417], PIP genes in all Cyperaceae and Juncaceae species examined in this study, i.e., tigernut, C. cristatella, R. breviuscula, and J. effuses, feature three introns with conserved positions. By contrast, zero to three introns was not only found in rice but also in other Poaceae species such as maize, sorghum, foxtail millet, switchgrass, B. distachyon, and barley[54,55], implying lineage/species-specific evolution.

    Despite the extensive expansion of PIP genes (PIP2) in tigernut even after the split with R. breviuscula, CePIP1;1, -2;1, and -2;8 were shown to represent three dominant members in most tissues examined in this study, i.e., young leaf, mature leaf, sheath, rhizome, shoot apex, and tuber, though the situation in root is more complex. CePIP1;1 was characterized as a transposed repeat of CePIP1;3, which represents the most expressed member in root. Moreover, its recent WGD repeat CePIP1;2 was shown to be lowly expressed in most tested tissues, implying their divergence. The ortholog of CePIP1;1 in rice is OsPIP1;3 (RWC-3), which was shown to be preferentially expressed in roots, stems, and leaves, in contrast to constitutive expression of OsPIP1;1 (OsPIP1a) and -1;2[5961], two recent WGD repeats. Injecting the cRNA of OsPIP1;3 into Xenopus oocytes could increase the osmotic water permeability by 2–3 times[60], though the activity is considerably lower than PIP2s such as OsPIP2;2 and -2;2[6163]. Moreover, OsPIP1;3 was shown to play a role in drought avoidance in upland rice and its overexpression in lowland rice could increase root osmotic hydraulic conductivity, leaf water potential, and relative cumulative transpiration at the end of 10 h PEG treatment[64]. CePIP2;8 was characterized as a transposed repeat of CePIP2;1. Since their orthologs are present in both rice and Arabidopsis (Supplemental Table S3), the duplication event is more likely to occur sometime before monocot-eudicot split. Interestingly, their orthologs in rice, i.e., OsPIP2;1 (OsPIP2a) and -2;6, respectively, are also constitutively expressed[61], implying a conserved evolution with similar functions. When heterologously expressed in yeast, OsPIP2;1 was shown to exhibit high water transport activity[62,6466]. Moreover, root hydraulic conductivity was decreased by approximately four folds in OsPIP2;1 RNAi knock-down rice plants[64]. The water transport activity of OsPIP2;6 has not been tested, however, it was proven to be an H2O2 transporter that is involved in resistance to rice blast[61]. More work especially transgenic tests may improve our knowledge of the function of these key CePIP genes.

    Leaf is a photosynthetic organ that regulates water loss through transpiration. In tigernut, PIP transcripts in leaves were mainly contributed by CePIP1;1, -2;1, and -2;8, implying their key roles. During leaf development, in contrast to gradual decrease of CePIP1;1 and -2;8 transcripts in three stages (i.e. young, mature, and senescing) examined in this study, CePIP2;1 peaked in mature leaves. Their high abundance in young leaves is by cell elongation and enlargement at this stage, whereas upregulation of CePIP2;1 in mature leaves may inform its possible role in photosynthesis[67]. Thus far, a high number of CO2 permeable PIPs have been identified, e.g., AtPIP2;1, HvPIP2;1, HvPIP2;2, HvPIP2;3, HvPIP2;5, and SiPIP2;7[6870]. Moreover, in mature leaves, CePIP1;1, -2;1, and -2;8 were shown to exhibit an apparent diurnal fluctuation expression pattern that was expressed more in the day and usually peaked at noon, which reflects transpiration and the fact that PIP genes are usually induced by light[11,7173]. In rice, OsPIP2;4 and -2;5 also showed a clear diurnal fluctuation in roots that peaked at 3 h after the onset of light and dropped to a minimum 3 h after the onset of darkness[11]. Notably, further studies showed that temporal and dramatic induction of OsPIP2;5 around 2 h after light initiation was triggered by transpirational demand but not circadian rhythm[74].

    As an oil-bearing tuber crop, the main economic goal of tigernut cultivation is to harvest underground tubers, whose development is highly dependent on water available[32,75]. According to previous studies, the moisture content of immature tigernut tubers maintains more than 80.0%, followed by a seed-like dehydration process with a drop of water content to less than 50% during maturation[28,32]. Thereby, the water balance in developmental tubers must be tightly regulated. Like leaves, the majority of PIP transcripts in tubers were shown to be contributed by CePIP1;1, -2;1, and -2;8, which was further confirmed at the protein level. In accordance with the trend of water content during tuber development, mRNA, and protein abundances of CePIP1;1, -2;1, and -2;8 in initiation and swelling tubers were considerably higher than that at the mature stage. High abundances of CePIP1;1, -2;1, and -2;8 at the initiation stage reflects rapid cell division and elongation, whereas upregulation of CePIP1;1 and -2;1 at the swelling stage is in accordance with cell enlargement and active physiological metabolism such as rapid oil accumulation[28,30]. At the mature stage, downregulation of PIP transcripts and protein abundances resulted in a significant drop in the moisture content, which is accompanied by the significant accumulation of late embryogenesis-abundant proteins[23,32]. The situation is highly distinct from other tuber plants such as potato (Solanum tuberosum), which may contribute to the difference in desiccation resistance between two species[32,76]. It's worth noting that, in one study, CePIP2;1 was not detected in any of the four tested stages, i.e., freshly harvested, dried, rehydrated for 48 h, and sprouted tubers[23]. By contrast, it was quantified in all three stages of tuber development examined in this study, i.e., 1, 25, and 35 DAI (corresponding to freshly harvested tubers), which represent initiation, swelling, and maturation. One possible reason is that the protein abundance of CePIP2;1 in mature tubers is not high enough to be quantified by nanoLC-MS/MS, which is relatively less sensitive than 4D-PRM used in this study[30,46]. In fact, nanoLC-MS/MS-based proteomic analysis of 30 samples representing six tissues/stages only resulted in 2,257 distinct protein groups[23].

    Taken together, our results imply a key role of CePIP1;1, -2;1, and -2;8 in tuber water balance, however, the mechanism underlying needs to be further studied, e.g., posttranslational modifications, protein interaction patterns, and transcriptional regulators.

    To our knowledge, this is the first genome-wide characterization of PIP genes in tigernut, a representative Cyperaceae plant with oil-bearing tubers. Fourteen CePIP genes representing two phylogenetic groups or 12 orthogroups are relatively more than that present in two model plants rice and Arabidopsis, and gene expansion was mainly contributed by WGD and transposed/tandem duplications, some of which are lineage or even species-specific. Among these genes, CePIP1;1, -2;1, and -2;8 have evolved to be three dominant members that are constitutively expressed in most tissues, including leaf and tuber. Transcription of these three dominant members in leaves are subjected to development and diurnal regulation, whereas in tubers, their mRNA and protein abundances are positively correlated with the moisture content during tuber development. Moreover, their plasma membrane-localization was confirmed by subcellular localization analysis, implying that they may function in the cell membrane. These findings shall not only provide valuable information for further uncovering the mechanism of tuber water balance but also lay a solid foundation for genetic improvement by regulating these key PIP members in tigernut.

    The authors confirm contribution to the paper as follows: study conception and design, supervision: Zou Z; analysis and interpretation of results: Zou Z, Zheng Y, Xiao Y, Liu H, Huang J, Zhao Y; draft manuscript preparation: Zou Z, Zhao Y. All authors reviewed the results and approved the final version of the manuscript.

    All the relevant data is available within the published article.

    This work was supported by the Hainan Province Science and Technology Special Fund (ZDYF2024XDNY171 and ZDYF2024XDNY156), China; the National Natural Science Foundation of China (32460342, 31971688 and 31700580), China; the Project of Sanya Yazhou Bay Science and Technology City (SCKJ-JYRC-2022-66), China. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  • The authors declare that they have no conflict of interest.

  • Supplemental Fig. S1 Neighbor joining dendrogram for the pools of the different populations.
    Supplemental Fig. S2 Graphical results of the different methods applied to infer the most likely number of clusters after the STRUCTURE analysis.
    Supplemental Fig. S3 Clustering of individuals for K = 2, K = 3, and K = 4.
    Supplemental Table S1 Genetic diversity of the pools.
    Supplemental Table S2 Pairwise genetic distance among individual poolss.
    Supplemental Table S3 IDs and sequence information of the final SNP set; SNP-IDs contain the ID of the corresponding sequence (on the left of the underscore and SNP position within the sequence (on the right of the underscore).
    Supplemental Data File S1 FASTA file of the sequences containing the final SNP set.
  • [1]

    Jonášová M, Prach K. 2004. Central-European mountain spruce (Picea abies (L.) Karst.) forests: regeneration of tree species after a bark beetle outbreak. Ecological Engineering 23:15−27

    doi: 10.1016/j.ecoleng.2004.06.010

    CrossRef   Google Scholar

    [2]

    Müller J, Bußler H, Goßner M, Rettelbach T, Duelli P. 2008. The European spruce bark beetle Ips typographus in a national park: from pest to keystone species. Biodiversity and Conservation 17:2979

    doi: 10.1007/s10531-008-9409-1

    CrossRef   Google Scholar

    [3]

    Wermelinger B. 2004. Ecology and management of the spruce bark beetle Ips typographus—a review of recent research. Forest Ecology and Management 202:67−82

    doi: 10.1016/j.foreco.2004.07.018

    CrossRef   Google Scholar

    [4]

    Gugerli F, Gall R, Meier F, Wermelinger B. 2008. Pronounced fluctuations of spruce bark beetle (Scolytinae: Ips typographus) populations do not invoke genetic differentiation. Forest Ecology and Management 256:405−9

    doi: 10.1016/j.foreco.2008.04.038

    CrossRef   Google Scholar

    [5]

    Mayer F, Piel FB, Cassel-Lundhagen A, Kirichenko N, Grumiau L, et al. 2015. Comparative multilocus phylogeography of two Palaearctic spruce bark beetles: influence of contrasting ecological strategies on genetic variation. Molecular Ecology 24:1292−310

    doi: 10.1111/mec.13104

    CrossRef   Google Scholar

    [6]

    Sallé A, Arthofer W, Lieutier F, Stauffer C, Kerdelhué C. 2007. Phylogeography of a host-specific insect: genetic structure of Ips typographus in Europe does not reflect past fragmentation of its host. Biological Journal of the Linnean Society 90:239−46

    doi: 10.1111/j.1095-8312.2007.00720.x

    CrossRef   Google Scholar

    [7]

    Montano V, Bertheau C, Doležal P, Krumböck S, Okrouhlík J, et al. 2016. How differential management strategies affect Ips typographus L. dispersal. Forest Ecology and Management 360:195−204

    doi: 10.1016/j.foreco.2015.10.037

    CrossRef   Google Scholar

    [8]

    Némethy M, Mihálik D, Steifetten Ø, Rošteková V, Mrkvová M, et al. 2018. Genetic differentiation between local populations of Ips typographus in the high Tatra Mountains range. Scandinavian Journal of Forest Research 33:215−21

    doi: 10.1080/02827581.2017.1368697

    CrossRef   Google Scholar

    [9]

    Bertheau C, Schuler H, Arthofer W, Avtzis DN, Mayer F, et al. 2013. Divergent evolutionary histories of two sympatric spruce bark beetle species. Molecular Ecology 22:3318−32

    doi: 10.1111/mec.12296

    CrossRef   Google Scholar

    [10]

    Krascsenitsová E, Kozánek M, Ferenčík J, Roller L, Stauffer C, et al. 2013. Impact of the Carpathians on the genetic structure of the spruce bark beetle Ips typographus. Journal of Pest Science 86:669−76

    doi: 10.1007/s10340-013-0508-8

    CrossRef   Google Scholar

    [11]

    Dowle EJ, Bracewell RR, Pfrender ME, Mock KE, Bentz BJ, et al. 2017. Reproductive isolation and environmental adaptation shape the phylogeography of mountain pine beetle (Dendroctonus ponderosae). Molecular Ecology 26:6071−84

    doi: 10.1111/mec.14342

    CrossRef   Google Scholar

    [12]

    Powell D, Groβe-Wilde E, Krokene P, Roy A, Chakraborty A, et al. 2021. A highly-contiguous genome assembly of the Eurasian spruce bark beetle, Ips typographus, provides insight into a major forest pest. Communications Biology 4:1059

    doi: 10.1038/s42003-021-02602-3

    CrossRef   Google Scholar

    [13]

    Andersson MN, Grosse-Wilde E, Keeling CI, Bengtsson JM, Yuen MMS, et al. 2013. Antennal transcriptome analysis of the chemosensory gene families in the tree killing bark beetles, Ips typographus and Dendroctonus ponderosae (Coleoptera: Curculionidae: Scolytinae). BMC Genomics 14:198

    doi: 10.1186/1471-2164-14-198

    CrossRef   Google Scholar

    [14]

    Yuvaraj JK, Roberts RE, Sonntag Y, Hou X, Grosse-Wilde E, et al. 2021. Putative ligand binding sites of two functionally characterized bark beetle odorant receptors. BMC Biology 19:16

    doi: 10.1186/s12915-020-00946-6

    CrossRef   Google Scholar

    [15]

    Puechmaille SJ. 2016. The program sᴛʀᴜᴄᴛᴜʀᴇ does not reliably recover the correct population structure when sampling is uneven: subsampling and new estimators alleviate the problem. Molecular Ecology Resources 16:608−27

    doi: 10.1111/1755-0998.12512

    CrossRef   Google Scholar

    [16]

    Evanno G, Regnaut S, Goudet J. 2005. Detecting the number of clusters of individuals using the software sᴛʀᴜᴄᴛᴜʀᴇ: a simulation study. Molecular Ecology 14:2611−20

    doi: 10.1111/j.1365-294X.2005.02553.x

    CrossRef   Google Scholar

    [17]

    Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945−59

    doi: 10.1093/genetics/155.2.945

    CrossRef   Google Scholar

    [18]

    Yang H, You C, Tsui CKM, Tembrock LR, Wu Z, et al. 2021. Phylogeny and biogeography of the Japanese rhinoceros beetle, Trypoxylus dichotomus (Coleoptera: Scarabaeidae) based on SNP markers. Ecology and Evolution 11:153−73

    doi: 10.1002/ece3.6982

    CrossRef   Google Scholar

    [19]

    Li H, Qu W, Obrycki JJ, Meng L, Zhou X, et al. 2020. Optimizing sample size for population genomic study in a global invasive lady beetle, Harmonia axyridis. Insects 11:290

    doi: 10.3390/insects11050290

    CrossRef   Google Scholar

    [20]

    Shegelski VA. 2020. Mountain pine beetle dispersal: morphology, genetics, and range expansion. Dissertation. University of Alberta, Alberta

    [21]

    Foll M, Gaggiotti O. 2008. A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 180:977−93

    doi: 10.1534/genetics.108.092221

    CrossRef   Google Scholar

    [22]

    Whitlock MC, Lotterhos KE. 2015. Reliable detection of loci responsible for local adaptation: inference of a null model through trimming the distribution of FST. The American Naturalist 186:S24−S36

    doi: 10.1086/682949

    CrossRef   Google Scholar

    [23]

    Excoffier L, Lischer HEL. 2010. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources 10:564−67

    doi: 10.1111/j.1755-0998.2010.02847.x

    CrossRef   Google Scholar

    [24]

    Flanagan SP, Jones AG. 2017. Constraints on the FST– heterozygosity outlier approach. Journal of Heredity 108:561−73

    doi: 10.1093/jhered/esx048

    CrossRef   Google Scholar

    [25]

    Nilssen AC. 1984. Long-range aerial dispersal of bark beetles and bark weevils (Coleoptera, Scolytidae and Curculionidae) in northern Finland. Annales Entomologici Fennici 50:37−42

    Google Scholar

    [26]

    Bertheau C, Salle A, Rossi J-P, Bankhead-dronnet S, Pineau X, et al. 2009. Colonisation of native and exotic conifers by indigenous bark beetles (Coleoptera: Scolytinae) in France. Forest Ecology and Management 258:1619−28

    doi: 10.1016/j.foreco.2009.07.020

    CrossRef   Google Scholar

    [27]

    Gautier M, Foucaud J, Gharbi K, Cézard T, Galan M, et al. 2013. Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Molecular Ecology 22:3766−79

    doi: 10.1111/mec.12360

    CrossRef   Google Scholar

    [28]

    Schlötterer C, Tobler R, Kofler R, Nolte V. 2014. Sequencing pools of individuals – mining genome-wide polymorphism data without big funding. Nature Reviews Genetics 15:749−63

    doi: 10.1038/nrg3803

    CrossRef   Google Scholar

    [29]

    Arvidsson S, Fartmann B, Winkler S, Zimmermann W. 2016. Efficient high-throughput SNP discovery and genotyping using normalised Genotyping-by-Sequencing (nGBS). LGC Technical Note: AN-161104.01. https://biosearch-cdn.azureedge.net/assetsv6/efficient-high-throughput-snp-discovery-genotyping-ngbs-app-note.pdf

    [30]

    Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658−59

    doi: 10.1093/bioinformatics/btl158

    CrossRef   Google Scholar

    [31]

    Garsmeur O, Droc G, Antonise R, Grimwood J, Potier B, et al. 2018. A mosaic monoploid reference sequence for the highly complex genome of sugarcane. Nature Communications 9:2638

    doi: 10.1038/s41467-018-05051-5

    CrossRef   Google Scholar

    [32]

    Liber M, Duarte I, Maia AT, Oliveira HR. 2021. The history of lentil (Lens culinaris subsp. culinaris) domestication and spread as revealed by genotyping-by-sequencing of wild and landrace accessions. Frontiers in Plant Science 12:628439

    doi: 10.3389/fpls.2021.628439

    CrossRef   Google Scholar

    [33]

    Palumbo F, Qi P, Pinto VB, Devos KM, Barcaccia G. 2019. Construction of the first SNP-based linkage map using genotyping-by-sequencing and mapping of the male-sterility gene in leaf chicory. Frontiers in Plant Science 10:276

    doi: 10.3389/fpls.2019.00276

    CrossRef   Google Scholar

    [34]

    Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9:357−59

    doi: 10.1038/nmeth.1923

    CrossRef   Google Scholar

    [35]

    Garrison EP, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. Preprint https://arxiv.org/abs/1207.3907

    [36]

    Knaus BJ, Grünwald NJ. 2017. ᴠᴄғʀ: a package to manipulate and visualize variant call format data in R. Molecular Ecology Resources 17:44−53

    doi: 10.1111/1755-0998.12549

    CrossRef   Google Scholar

    [37]

    Gruber B, Unmack PJ, Berry OF, Georges A. 2018. ᴅᴀʀᴛʀ: An ʀ package to facilitate analysis of SNP data generated from reduced representation genome sequencing. Molecular Ecology Resources 18:691−99

    doi: 10.1111/1755-0998.12745

    CrossRef   Google Scholar

    [38]

    Shen W, Le S, Li Y, Hu F. 2016. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLOS ONE 11:e0163962

    doi: 10.1371/journal.pone.0163962

    CrossRef   Google Scholar

    [39]

    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. Journal of Molecular Biology 215:403−10

    doi: 10.1016/S0022-2836(05)80360-2

    CrossRef   Google Scholar

    [40]

    Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, et al. 2008. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research 36:3420−35

    doi: 10.1093/nar/gkn176

    CrossRef   Google Scholar

    [41]

    Goudet J, Jombart T. 2020. hierfstat: Estimation and tests of hierarchical F-statistics. R package version 0.5-7. https://CRAN.R-project.org/package=hierfstat

    [42]

    Lischer HEL, Excoffier L. 2012. PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics 28:298−99

    doi: 10.1093/bioinformatics/btr642

    CrossRef   Google Scholar

    [43]

    Kamvar ZN, Tabima JF, Grünwald NJ. 2014. Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2:e281

    doi: 10.7717/peerj.281

    CrossRef   Google Scholar

    [44]

    Kamvar ZN, Brooks JC, Grünwald NJ. 2015. Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Frontiers in Genetics 6:208

    doi: 10.3389/fgene.2015.00208

    CrossRef   Google Scholar

    [45]

    RStudio Team. 2021. RStudio: Integrated Development Environment for R. http://www.rstudio.com/

    [46]

    Chhatre VE, Emerson KJ. 2017. StrAuto: automation and parallelization of STRUCTURE analysis. BMC Bioinformatics 18:192

    doi: 10.1186/s12859-017-1593-0

    CrossRef   Google Scholar

    [47]

    Li Y, Liu J. 2018. StructureSelector: A web-based software to select and visualize the optimal number of clusters using multiple methods. Molecular Ecology Resources 18:176−77

    doi: 10.1111/1755-0998.12719

    CrossRef   Google Scholar

    [48]

    Kopelman NM, Mayzel J, Jakobsson M, Rosenberg NA, Mayrose I. 2015. CLUMPAK: a program for identifying clustering modes and packaging population structure inferences across K. Molecular Ecology Resources 15:1179−91

    doi: 10.1111/1755-0998.12387

    CrossRef   Google Scholar

    [49]

    R Core Team. 2021. R: A language and environment for statistical computing. http://www.R-project.org/

  • Cite this article

    Müller M, Niesar M, Berens I, Gailing O. 2022. Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations. Forestry Research 2:1 doi: 10.48130/FR-2022-0001
    Müller M, Niesar M, Berens I, Gailing O. 2022. Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations. Forestry Research 2:1 doi: 10.48130/FR-2022-0001

Figures(1)  /  Tables(2)

Article Metrics

Article views(5768) PDF downloads(763)

ARTICLE   Open Access    

Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations

Forestry Research  2 Article number: 1  (2022)  |  Cite this article

Abstract: The European spruce bark beetle (Ips typographus L.) is a serious pest in Norway spruce stands. While usually attacking freshly fallen trees or trees with a reduced defense system, also healthy trees can be infested during massive outbreaks of I. typographus that can occur after catastrophic events such as drought periods or storms. Knowledge of the genetic structure of this species, especially on local scales is still ambiguous. While local population structure was reported in some studies, others did not detect any differentiation among I. typographus populations. Here, we used genotyping by sequencing to infer the genetic structure of two I. typographus populations in western Germany, which had a distance of approx. 58 km from each other. Based on 16,830 SNPs we detected high genetic diversity, but very low genetic differentiation between the populations (FST: 0.001) and a lack of population structure. These results suggest a high dispersal ability of I. typographus.

    • The European spruce bark beetle (Ips typographus L.) is regarded as a keystone species in forest ecosystems driving forest regeneration[1,2]. At the same time, it is a serious pest in Norway spruce stands (Picea abies [L.] KARST.)[3]. Usually, I. typographus attacks freshly fallen spruce trees or trees that have a reduced defense system due to stress[4], but under massive outbreaks it can also attack healthy trees[3,5]. Massive population increases can occur after events such as drought periods, storms or clear cuts, and can lead to heavy losses of spruce tree stands. Therefore, knowledge of population dynamics and dispersal distances, reflected in genetic structures, are needed to inform forest management and mitigation strategies.

      Several studies have analyzed the genetic structure of I. typographus populations using different genetic markers such as simple sequence repeats (SSRs)[4,68], mitochondrial markers[5,7,9,10], nuclear coding gene fragments[5], or ribosomal DNA (internal transcribed spacer (ITS))[9]. These studies, however, came to different conclusions. For instance, Sallé et al.[6] did not find population structure among I. typographus populations in Europe based on SSRs, while Mayer et al.[5] detected, based on a wider sampling and mitochondrial and nuclear coding gene fragments, a geographic subdivision into a northern and southern group of this species. On a more local scale, Krascsenitsová et al.[10] detected only slight genetic structure, but differences in haplotype distribution between Western/Southern Carpathians and the Eastern Carpathians using a mitochondrial marker, whereas Némethy et al.[8] detected no population structure of this species in the Carpathians based on SSRs. Using the same marker type, Montano et al.[7] detected population structure between I. typographus populations from managed and unmanaged spruce stands in the Bohemian forest and the Limestone Alps. In contrast, Gugerli et al.[4] reported a lack of local population structure among I. typographus populations in Switzerland. Thus, especially on the local scale, the extent of population structure in this species is not well known.

      The development of high-throughput-sequencing (HTS) makes it now possible to investigate genome-wide data even in non-model species. For instance, Dowle et al.[11] used double-digest restriction-associated DNA (ddRAD) sequencing to investigate phylogeography and environmental adaptation in mountain pine beetle (Dendroctonus ponderosae Hopkins) populations across the entire distribution range of this species in western North America. HTS may also reveal a clearer pattern of population structure in I. typographus, but despite the recently published genome of I. typographus[12] and antennal transcriptome studies investigating chemosensation[13,14], there have been, to our knowledge, no studies conducted analyzing genome-wide genetic variation in this species. Here, we applied genotyping by sequencing of pooled samples to identify genome-wide SNPs (single nucleotide polymorphisms) in I. typographus, and used these SNPs to infer population structure between two I. typographus populations in Germany. We hypothesize that a genome-wide marker set including potentially adaptive SNPs would reveal more distinct population structure compared to previously used marker sets from more restricted parts of the genome.

    • Sequencing revealed 630 million reads, which are ~10 million reads per pool. In total, 794,341 SNPs were identified across all pools. The initial filtering step (total number of fully covered SNPs in 10% of pools, MAF ≥ 0.05, min. read count of 8) led to 321,562 SNPs. Further filtering for a higher call rate (0.8), and linkage disequilibrium (R2 < 0.5) reduced the number of SNPs to 29,031 and 17,748, respectively. The exclusion of the population Engelskirchen due to an unknown number of sampled trees reduced the SNP number to 17,717. In total, 664 out of 11,225 cluster reference sequences, in which the SNPs were located, could not be assigned to the I. typographus genome (see Materials and Methods). SNPs located in these sequences were removed (in total 887 SNPs) leading to a final SNP set of 16,830 SNPs.

    • Observed heterozygosity (Ho) was 0.245 in Ahlefeld and 0.258 in Arnsberg (Table 1). Expected heterozygosity (He) was 0.265 in Ahlefeld and 0.275 in Arnsberg, and allelic richness (Ar) was 1.84 in Ahlefeld and 1.83 in Arnsberg. The inbreeding coefficients (Fis Ahlefeld: 0.077, Fis Arnsberg: 0.061) were not significantly different from zero in the two populations. The genetic differentiation between populations was very low (FST: 0.001) and not significant (Table 1).

      Table 1.  Genetic diversity indices and genetic differentiation of the populations.

      PopulationNHoHeArFisFST
      Ahlefeld210.2450.2651.840.0770.001
      Arnsberg140.2580.2751.830.061
      Over all350.2410.2591.830.069
      N: number of pools, Ho: observed heterozygosity, He: expected heterozygosity, Fis: inbreeding coefficient (not significantly different from zero), FST: fixation index (not significant)

      The genetic diversity of the pools measured as observed heterozygosity (Ho) was very similar and ranged from 0.227 to 0.250 (Supplemental Table S1). Also, the pairwise genetic distances among individual pools were very similar (Supplemental Table S2), and of the same magnitude among pools within populations as well as between populations (mean Hamming distance of pools both within populations and between populations: 0.2).

      The AMOVA revealed that 99.92% of the variation can be found within populations and 0.08% among populations.

      Principal component analysis (PCA) did not detect principal components (PCs) that explain a large amount of variance. The first and second PCs explain both 3.4% of the variance. The populations were not clearly separated in the PCA, and pools taken from the same tree were not more similar compared to pools taken from different trees (Fig. 1). Similar results were obtained for the neighbor joining dendrogram (Supplemental Fig. S1).

      Figure 1. 

      Principle component analysis (PCA) of the pools. Similar numbers refer to pools of samples taken from the same tree.

      The MaxMean K method[15] revealed K = 2 as the most likely number of clusters, while the Δ K method[16] revealed K = 3. All other methods (ln Pr(XǀK)[17], MedMed K, MedMean K, and MaxMed K[15]) revealed K = 1 as the most likely number of clusters (Supplemental Fig. S2). No distinct cluster assignment was found for the two populations, but the population Arnsberg showed a higher proportion of the blue cluster than the population Ahlefeld, when assuming K = 2 (Supplemental Fig. S3). Nevertheless, the genetic differentiation of the clusters was very low (net nucleotide distance assuming two clusters: 0.0002). Thus, there is likely no structure between the two populations (K = 1).

      Of the three applied programs for the detection of outliers (BayeScan, OutFLANK, and Arlequin), only Arlequin detected three outlier loci (SNPs '54442-930_229', '45651-1144_76', and '45292-1156_123') located in sequences (GenBank accession numbers JADDUH010000001.1, JADDUH010000006.1, and JADDUH010000010.1) within contigs 1, 6, and 10 of the I. typographus genome[12]. Only for the surrounding sequence of SNP '54442-930_229' an annotation was obtained (hypothetical protein YQE_03355, partial [Dendroctonus ponderosae]).

    • The overall observed (Ho) and expected heterozygosity (He) of the populations was 0.241 and 0.259, respectively. Since, to our knowledge, there are no other diversity data based on SNPs available for I. typographus, it is not possible to directly compare genetic diversity to other populations. Studies based on genome-wide SNP data of other Coleoptera species revealed, for instance, values of 0.111 (Ho) and 0.257 (He) for the Japanese rhinoceros beetle (Trypoxylus dichotomus L.)[18], 0.078 (Ho) and 0.087 (He) for the invasive lady beetle Harmonia axyridis Pall.[19], and 0.162 (Ho) and 0,180 (He) for the mountain pine beetle Dendroctonus ponderosae Hopkins[20]. There are more studies available that used SSR markers for the estimation of genetic diversity in I. typographus populations, in which higher values of diversity indices are expected compared to SNPs, due to the higher number of alleles usually present at SSR loci. For instance, Gugerli et al.[4] reported values of He ranging from 0.463 to 0.560, Montano et al.[7] reported values ranging from 0.387 to 0.469, and Némethy et al.[8] found a mean value of He of 0.687 among populations. Thus, the genetic diversity of I. typographus populations seems to be comparatively high. The inbreeding coefficient (Fis) was not significantly different from zero, hence there are no indications of homo- or heterozygosity excesses in the populations. We further found very low population differentiation (FST: 0.001) in our study and a lack of population structure. These results are in agreement with other studies that analyzed population differentiation of I. typographus based on SSR markers on a local scale[4,8,10]. Only Montano et al.[7] detected population structure between I. typographus populations from managed and unmanaged spruce stands in the Bohemian forest and the Limestone Alps. Thus, in contrast to our hypothesis, even the use of a genome-wide marker set involving potentially adaptive genetic variation did not reveal any population structure between populations. Two of three programs used for the detection of outlier loci (BayeScan[21], OutFLANK[22], and Arlequin[23]) did not reveal any outliers. Only Arlequin detected three outlier SNPs (SNPs '54442-930_229', '45651-1144_76', and '45292-1156_123'), which were located in the contigs 1, 6, and 10 of the I. typographus genome[12]. Since only two populations were compared in our study, FST-heterozygosity outlier methods as implemented in Arlequin may not perform well (instead BayeScan should be suitable)[24]. Therefore, the outlier loci revealed by Arlequin in this study may be false positive ones.

      Our results indicate a high connectivity of the populations and random mating. Indeed, a high dispersal ability of I. typographus is assumed[4,6,7,25]. Since this species is developing on weakened or recently dead trees, which are usually scarce and distributed over the landscape, it can be expected that I. typographus has evolved efficient foraging capacities[6]. Thus, wind supported dispersal distances of 43 km can be expected for this species[25]. Montano et al.[7] even estimated a dispersal distance of more than 100 km, whereby several smaller intervening forest patches between the study areas likely helped to maintain connectivity. The distance between the populations observed in our study was approx. 58 km, and there were forest stands located in between the two study areas. Hence, it can be expected that there is migration between the two populations. Additionally, the sampling was conducted in a time of high population density of I. typographus in the study area. The beetles also colonized pine trees which has been observed previously[5,26]. We, however, did not detect genetic differences of I. typographus individuals inhabiting spruce or pine in our study (data not shown).

      We used genotyping-by-sequencing of pooled samples in this study, since the DNA extracted from heads and legs of single beetles showed a too low quantity for sequencing. In general, pool-GBS leads to allele frequency estimates that are similar to estimates based on analysis of individuals[27], but the accuracy of allele frequency estimates might be affected by unequal amounts of DNA from each individual in the pool[28]. Since we did not use equal amounts of DNA for pooling (tissues were pooled for DNA extraction), each individual might not have contributed in the same way to the final pool. Nevertheless, we sequenced several pools per population and the pools showed very similar diversities (Supplemental Table S1). Therefore, we assume that the pooling did not strongly affect the results of our study.

    • We used GBS to investigate the genetic structure between two I. typographus populations in western Germany. We found high genetic diversity of the analyzed populations, but very low population differentiation. These results suggest a high dispersal ability of the European spruce bark beetle. The set of 16,830 SNPs provided in this study can be used in future studies of I. typographus. In the future, more populations spanning larger areas may be sampled to detect genomic signatures of selection. Further, environmental variables could be jointly investigated with the genomic data to conduct environmental association studies.

    • In three populations (Ahlefeld, Arnsberg, and Engelskirchen) located in the German federal state North Rhine-Westphalia, spruce bark beetles were sampled from standing and lying trees in 2020. In Ahlefeld and Arnsberg five trees each were sampled, whereas an unknown number of trees were sampled in the population Engelskirchen (Table 2). Since the exact number of trees sampled in Engelskirchen is unknown and the beetles of all samples were mixed in this population, samples of the Engelskirchen population were only used for SNP identification, but not for population genetic analysis. The beetles were directly sampled into 80% EtOH or first frozen and subsequently conserved in 80% EtOH.

      Table 2.  Overview of the sampled populations.

      PopulationLatitudeLongitudeNo. of
      sampled trees
      No. of
      pools
      Engelskirchen50.976107987.41474115NA28
      Ahlefeld50.996519437.55328433521
      Arnsberg51.442453047.99021258514
    • To avoid negative effects of gut content on the sequencing, only heads and legs of the beetles were used for DNA isolation. A first attempt of DNA isolation based on single beetles revealed too low DNA quantity for sequencing. Therefore, heads and legs of five beetles of each sample were pooled for DNA isolation, which led to a sufficient DNA quality and quantity. In total, 63 pools were sent to LGC Genomics for DNA isolation (Table 2).

    • Library preparation, normalized genotyping by sequencing (nGBS[29]), and SNP identification was conducted by LGC Genomics. Paired-end sequencing (2 × 150 bp) was conducted on an Illumina NextSeq 550 system aiming at 10 million reads per sample. Raw sequencing reads were deposited in the NCBI Sequence Read Archive (SRA) under BioProject number PRJNA781394. Since variable alignment rates between 54.9% and 92.5% (mean 75.9%) of the pools to the I. typographus genome[12] were observed, we decided to build a cluster reference for read mapping. Thus, after demultiplexing and quality trimming, clustering of combined reads was conducted with CD-HIT-EST v4.6.1[30]. This widely used program (for its use with GBS data see e.g., Garsmeur et al.[31], Liber et al.[32], Palumbo et al.[33]) sorts the sequences from long to short, whereas the longest sequence becomes the representative of the first cluster. Afterwards each sequence is compared with the representative sequences of existing clusters. If the similarity is above a given threshold, the sequence is grouped into the cluster, if the threshold is not reached, a new cluster is defined[30]. We allowed up to 5% differences for clustering. The reads were aligned against the cluster reference using Bowtie2 v2.2.3[34]. Variant discovery was conducted with Freebayes v1.0.2-16[35]. A first filtering of SNPs was conducted (total number of fully covered SNPs in 10% of samples (pools), MAF ≥ 0.05, min. read count of 8), and the corresponding VCF file used for further analysis (for further filtering see below).

    • The R package vcfR v1.12.0[36] was used to convert the VCF file described above into the genlight format readable by the R package dartR[37]. The dartR v1.8.3 package[37] was used for further filtering of the SNPs regarding call rate (set to 0.8, i.e., SNPs need to be present in 80% of all samples) and linkage disequilibrium (R2 < 0.5). To remove potential contaminations from our SNP set (i.e., the underlying cluster reference sequences) we only kept SNPs that were located in sequences that were successfully assigned to the I. typographus genome. For this, we first filtered the cluster reference for sequences that contained SNPs from our SNP set using SeqKit v2.0.0[38]. For these sequences, blastn[39] searches against the I. typographus genome[12] were performed using Blast2Go v5.2.5[40]. SNPs located in sequences that were not assigned to the I. typographus genome were removed from our final SNP set. The final SNP set can be found in Supplemental Table S3 and the corresponding sequences in Supplemental Data File S1. The R package hierfstat v0.5-7[41] was used to calculate observed heterozygosity (Ho), expected heterozygosity (He), allelic richness (Ar), inbreeding coefficient (Fis), and fixation index (FST). Confidence intervals for Fis and FST were calculated using 1,000 bootstraps over loci. Ho of single pools was calculated with dartR. PGDSpider v2.1.1.5[42] was used for input file conversion, and subsequently Analysis of Molecular Variance (AMOVA) based on 1000 permutations was conducted in Arlequin v3.5.2.2[23]. DartR was used to conduct a principle component analysis (PCA) of the pools. A neighbor joining dendrogram based on Hamming distance and 1000 bootstrap replicates was constructed with the R package poppr v2.8.7[43,44]. The same R package was also used to calculate pairwise genetic distances (Hamming distance) among individual pools. Computationally intensive tasks were performed on the Rstudio server v1.4.1106[45] of the Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen (GWDG). STRUCTURE v2.3.4[17] was used to infer population structure. The admixture model and correlated allele frequencies were used. A burn-in period of 10,000 and Markov chain Monte Carlo (MCMC) replicates of 100,000 were used. Potential clusters (K) from 1 to 4 were tested using 5 iterations. STRUCTURE was run on the high performance computing system of the GWDG using StrAuto v1.0[46]. StructureSelector[47] was used to determine the most likely number of K based on different methods such as Δ K[16], ln Pr(XǀK)[17], and the methods proposed by Puechmaille[15] MedMed K, MedMean K, MaxMed K, and MaxMean K. CLUMPAK[48] was used for summation and graphical representation of the STRUCTURE results. Three different types of software were used for the detection of outlier loci between the two populations. BayeScan v2.1[21] was run using default parameters including 100,000 iterations and a burn-in period of 50,000. The prior odds for the neutral model were set to 1000, and a q-value threshold of 10% was chosen to determine significant outliers. OutFLANK[22] implemented in the R package dartR v1.8.3[37] was run using default parameters. Finally, Arlequin v3.5.2.2[23] was run with the non-hierarchical finite island model using 100,000 simulations and 100 simulated demes. P-values were adjusted using the p.adjust R function[49] applying a false discovery rate (FDR) of 0.05 to determine significant outliers. Annotations for significant outlier loci were obtained by searching the relevant sequences against the NCBI non-redundant protein sequences database using BLASTX[39]. For all mentioned analyses in R, R v4.0.4[49] was used.

    • We acknowledge funding by the Ministry for Environment, Agriculture, Conservation and Consumer Protection of the State of North Rhine-Westphalia.

      • The authors declare that they have no conflict of interest.

      • Supplemental Fig. S1 Neighbor joining dendrogram for the pools of the different populations.
      • Supplemental Fig. S2 Graphical results of the different methods applied to infer the most likely number of clusters after the STRUCTURE analysis.
      • Supplemental Fig. S3 Clustering of individuals for K = 2, K = 3, and K = 4.
      • Supplemental Table S1 Genetic diversity of the pools.
      • Supplemental Table S2 Pairwise genetic distance among individual poolss.
      • Supplemental Table S3 IDs and sequence information of the final SNP set; SNP-IDs contain the ID of the corresponding sequence (on the left of the underscore and SNP position within the sequence (on the right of the underscore).
      • Supplemental Data File S1 FASTA file of the sequences containing the final SNP set.
      • Copyright: © 2022 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.
    Figure (1)  Table (2) References (49)
  • About this article
    Cite this article
    Müller M, Niesar M, Berens I, Gailing O. 2022. Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations. Forestry Research 2:1 doi: 10.48130/FR-2022-0001
    Müller M, Niesar M, Berens I, Gailing O. 2022. Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations. Forestry Research 2:1 doi: 10.48130/FR-2022-0001

Catalog

  • About this article

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return