-
Sequencing revealed 630 million reads, which are ~10 million reads per pool. In total, 794,341 SNPs were identified across all pools. The initial filtering step (total number of fully covered SNPs in 10% of pools, MAF ≥ 0.05, min. read count of 8) led to 321,562 SNPs. Further filtering for a higher call rate (0.8), and linkage disequilibrium (R2 < 0.5) reduced the number of SNPs to 29,031 and 17,748, respectively. The exclusion of the population Engelskirchen due to an unknown number of sampled trees reduced the SNP number to 17,717. In total, 664 out of 11,225 cluster reference sequences, in which the SNPs were located, could not be assigned to the I. typographus genome (see Materials and Methods). SNPs located in these sequences were removed (in total 887 SNPs) leading to a final SNP set of 16,830 SNPs.
Genetic diversity and differentiation
-
Observed heterozygosity (Ho) was 0.245 in Ahlefeld and 0.258 in Arnsberg (Table 1). Expected heterozygosity (He) was 0.265 in Ahlefeld and 0.275 in Arnsberg, and allelic richness (Ar) was 1.84 in Ahlefeld and 1.83 in Arnsberg. The inbreeding coefficients (Fis Ahlefeld: 0.077, Fis Arnsberg: 0.061) were not significantly different from zero in the two populations. The genetic differentiation between populations was very low (FST: 0.001) and not significant (Table 1).
Table 1. Genetic diversity indices and genetic differentiation of the populations.
Population N Ho He Ar Fis FST Ahlefeld 21 0.245 0.265 1.84 0.077 0.001 Arnsberg 14 0.258 0.275 1.83 0.061 Over all 35 0.241 0.259 1.83 0.069 N: number of pools, Ho: observed heterozygosity, He: expected heterozygosity, Fis: inbreeding coefficient (not significantly different from zero), FST: fixation index (not significant) The genetic diversity of the pools measured as observed heterozygosity (Ho) was very similar and ranged from 0.227 to 0.250 (Supplemental Table S1). Also, the pairwise genetic distances among individual pools were very similar (Supplemental Table S2), and of the same magnitude among pools within populations as well as between populations (mean Hamming distance of pools both within populations and between populations: 0.2).
The AMOVA revealed that 99.92% of the variation can be found within populations and 0.08% among populations.
Principal component analysis (PCA) did not detect principal components (PCs) that explain a large amount of variance. The first and second PCs explain both 3.4% of the variance. The populations were not clearly separated in the PCA, and pools taken from the same tree were not more similar compared to pools taken from different trees (Fig. 1). Similar results were obtained for the neighbor joining dendrogram (Supplemental Fig. S1).
Figure 1.
Principle component analysis (PCA) of the pools. Similar numbers refer to pools of samples taken from the same tree.
The MaxMean K method[15] revealed K = 2 as the most likely number of clusters, while the Δ K method[16] revealed K = 3. All other methods (ln Pr(XǀK)[17], MedMed K, MedMean K, and MaxMed K[15]) revealed K = 1 as the most likely number of clusters (Supplemental Fig. S2). No distinct cluster assignment was found for the two populations, but the population Arnsberg showed a higher proportion of the blue cluster than the population Ahlefeld, when assuming K = 2 (Supplemental Fig. S3). Nevertheless, the genetic differentiation of the clusters was very low (net nucleotide distance assuming two clusters: 0.0002). Thus, there is likely no structure between the two populations (K = 1).
Of the three applied programs for the detection of outliers (BayeScan, OutFLANK, and Arlequin), only Arlequin detected three outlier loci (SNPs '54442-930_229', '45651-1144_76', and '45292-1156_123') located in sequences (GenBank accession numbers JADDUH010000001.1, JADDUH010000006.1, and JADDUH010000010.1) within contigs 1, 6, and 10 of the I. typographus genome[12]. Only for the surrounding sequence of SNP '54442-930_229' an annotation was obtained (hypothetical protein YQE_03355, partial [Dendroctonus ponderosae]).
-
In three populations (Ahlefeld, Arnsberg, and Engelskirchen) located in the German federal state North Rhine-Westphalia, spruce bark beetles were sampled from standing and lying trees in 2020. In Ahlefeld and Arnsberg five trees each were sampled, whereas an unknown number of trees were sampled in the population Engelskirchen (Table 2). Since the exact number of trees sampled in Engelskirchen is unknown and the beetles of all samples were mixed in this population, samples of the Engelskirchen population were only used for SNP identification, but not for population genetic analysis. The beetles were directly sampled into 80% EtOH or first frozen and subsequently conserved in 80% EtOH.
Table 2. Overview of the sampled populations.
Population Latitude Longitude No. of
sampled treesNo. of
poolsEngelskirchen 50.97610798 7.41474115 NA 28 Ahlefeld 50.99651943 7.55328433 5 21 Arnsberg 51.44245304 7.99021258 5 14 DNA extraction
-
To avoid negative effects of gut content on the sequencing, only heads and legs of the beetles were used for DNA isolation. A first attempt of DNA isolation based on single beetles revealed too low DNA quantity for sequencing. Therefore, heads and legs of five beetles of each sample were pooled for DNA isolation, which led to a sufficient DNA quality and quantity. In total, 63 pools were sent to LGC Genomics for DNA isolation (Table 2).
Genotyping by sequencing and SNP identification
-
Library preparation, normalized genotyping by sequencing (nGBS[29]), and SNP identification was conducted by LGC Genomics. Paired-end sequencing (2 × 150 bp) was conducted on an Illumina NextSeq 550 system aiming at 10 million reads per sample. Raw sequencing reads were deposited in the NCBI Sequence Read Archive (SRA) under BioProject number PRJNA781394. Since variable alignment rates between 54.9% and 92.5% (mean 75.9%) of the pools to the I. typographus genome[12] were observed, we decided to build a cluster reference for read mapping. Thus, after demultiplexing and quality trimming, clustering of combined reads was conducted with CD-HIT-EST v4.6.1[30]. This widely used program (for its use with GBS data see e.g., Garsmeur et al.[31], Liber et al.[32], Palumbo et al.[33]) sorts the sequences from long to short, whereas the longest sequence becomes the representative of the first cluster. Afterwards each sequence is compared with the representative sequences of existing clusters. If the similarity is above a given threshold, the sequence is grouped into the cluster, if the threshold is not reached, a new cluster is defined[30]. We allowed up to 5% differences for clustering. The reads were aligned against the cluster reference using Bowtie2 v2.2.3[34]. Variant discovery was conducted with Freebayes v1.0.2-16[35]. A first filtering of SNPs was conducted (total number of fully covered SNPs in 10% of samples (pools), MAF ≥ 0.05, min. read count of 8), and the corresponding VCF file used for further analysis (for further filtering see below).
Data analysis
-
The R package vcfR v1.12.0[36] was used to convert the VCF file described above into the genlight format readable by the R package dartR[37]. The dartR v1.8.3 package[37] was used for further filtering of the SNPs regarding call rate (set to 0.8, i.e., SNPs need to be present in 80% of all samples) and linkage disequilibrium (R2 < 0.5). To remove potential contaminations from our SNP set (i.e., the underlying cluster reference sequences) we only kept SNPs that were located in sequences that were successfully assigned to the I. typographus genome. For this, we first filtered the cluster reference for sequences that contained SNPs from our SNP set using SeqKit v2.0.0[38]. For these sequences, blastn[39] searches against the I. typographus genome[12] were performed using Blast2Go v5.2.5[40]. SNPs located in sequences that were not assigned to the I. typographus genome were removed from our final SNP set. The final SNP set can be found in Supplemental Table S3 and the corresponding sequences in Supplemental Data File S1. The R package hierfstat v0.5-7[41] was used to calculate observed heterozygosity (Ho), expected heterozygosity (He), allelic richness (Ar), inbreeding coefficient (Fis), and fixation index (FST). Confidence intervals for Fis and FST were calculated using 1,000 bootstraps over loci. Ho of single pools was calculated with dartR. PGDSpider v2.1.1.5[42] was used for input file conversion, and subsequently Analysis of Molecular Variance (AMOVA) based on 1000 permutations was conducted in Arlequin v3.5.2.2[23]. DartR was used to conduct a principle component analysis (PCA) of the pools. A neighbor joining dendrogram based on Hamming distance and 1000 bootstrap replicates was constructed with the R package poppr v2.8.7[43,44]. The same R package was also used to calculate pairwise genetic distances (Hamming distance) among individual pools. Computationally intensive tasks were performed on the Rstudio server v1.4.1106[45] of the Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen (GWDG). STRUCTURE v2.3.4[17] was used to infer population structure. The admixture model and correlated allele frequencies were used. A burn-in period of 10,000 and Markov chain Monte Carlo (MCMC) replicates of 100,000 were used. Potential clusters (K) from 1 to 4 were tested using 5 iterations. STRUCTURE was run on the high performance computing system of the GWDG using StrAuto v1.0[46]. StructureSelector[47] was used to determine the most likely number of K based on different methods such as Δ K[16], ln Pr(XǀK)[17], and the methods proposed by Puechmaille[15] MedMed K, MedMean K, MaxMed K, and MaxMean K. CLUMPAK[48] was used for summation and graphical representation of the STRUCTURE results. Three different types of software were used for the detection of outlier loci between the two populations. BayeScan v2.1[21] was run using default parameters including 100,000 iterations and a burn-in period of 50,000. The prior odds for the neutral model were set to 1000, and a q-value threshold of 10% was chosen to determine significant outliers. OutFLANK[22] implemented in the R package dartR v1.8.3[37] was run using default parameters. Finally, Arlequin v3.5.2.2[23] was run with the non-hierarchical finite island model using 100,000 simulations and 100 simulated demes. P-values were adjusted using the p.adjust R function[49] applying a false discovery rate (FDR) of 0.05 to determine significant outliers. Annotations for significant outlier loci were obtained by searching the relevant sequences against the NCBI non-redundant protein sequences database using BLASTX[39]. For all mentioned analyses in R, R v4.0.4[49] was used.
-
We acknowledge funding by the Ministry for Environment, Agriculture, Conservation and Consumer Protection of the State of North Rhine-Westphalia.
-
About this article
Cite this article
Müller M, Niesar M, Berens I, Gailing O. 2022. Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations. Forestry Research 2:1 doi: 10.48130/FR-2022-0001
Genotyping by sequencing reveals lack of local genetic structure between two German Ips typographus L. populations
- Received: 09 December 2021
- Accepted: 18 January 2022
- Published online: 26 January 2022
Abstract: The European spruce bark beetle (Ips typographus L.) is a serious pest in Norway spruce stands. While usually attacking freshly fallen trees or trees with a reduced defense system, also healthy trees can be infested during massive outbreaks of I. typographus that can occur after catastrophic events such as drought periods or storms. Knowledge of the genetic structure of this species, especially on local scales is still ambiguous. While local population structure was reported in some studies, others did not detect any differentiation among I. typographus populations. Here, we used genotyping by sequencing to infer the genetic structure of two I. typographus populations in western Germany, which had a distance of approx. 58 km from each other. Based on 16,830 SNPs we detected high genetic diversity, but very low genetic differentiation between the populations (FST: 0.001) and a lack of population structure. These results suggest a high dispersal ability of I. typographus.
-
Key words:
- GBS /
- Genome-wide /
- Coleoptera /
- Bark beetle /
- Pest /
- Forest /
- Genetics