2.7 | Local ancestry and genome-wide patterns of introgression
Haplotypes were estimated from SNP genotypes at ddRAD loci in each chromosome through imputation and phasing using Beagle 5.2 (Browning et al. 2018; Browning et al. 2021). Sampled trees in both sites were divided into Qc reference individuals (the Qd ancestryS = 0), admixed individuals (0 < S < 1), and Qd reference individuals (S = 1). We converted a haplotype matrix in each of three vcf files of the Qc reference individuals, the Qd reference individuals, and the admixed individuals into a Numpy array using the function vcf2npy. Local ancestry at ddRAD loci of the admixed individuals was described as homozygote of alleles derived from either Qc or Qdancestral population and heterozygote of alleles derived from bothQc and Qd ancestral populations. The three ancestry types were inferred from the haplotype arrays using the Python package Loter with the functions lc.loter_smooth and loter.locanc.local_ancestry (Dias-Alves et al. 2018). The package assumed that haplotypes of admixed individuals originated from hybridization and recombination of ancestral populations of reference individuals (Dias-Alves et al. 2018). To visualize genome-wide patterns of introgression, we calculated the mean number of alleles derived from Qd at each locus in the admixed individuals.
To depict genome-wide introgression patterns in another method, we estimated the Patterson’s D statistic at ddRAD loci (Martin et al. 2015). The D statistic requires allele frequencies of four populations with phylogenetic relationship (((P1, P2), P3), O), and positive D values indicate introgression from P3 to P2 as discriminating incomplete lineage sorting. We assigned the Qcreference individuals to P1, the admixed individuals to P2, theQd reference individuals to P3, and the Q. robur reference sequence to O. We calculated the number of derived alleles that are different from the Q. robur reference (ancestral) alleles for individuals of P1, P3, and P3 from the vcf files and obtained the frequency of derived alleles, p 1,p 2, and p 3, respectively, using the package gaston 1.5.9 in R 3.3.2. Using the ABBA-BABA statistics, C ABBA = (1 –p 1) p 2p 3 and C BABA =p 1 (1 – p 2)p 3, at each locus, we obtained D = (ΣC ABBA – ΣC ABBA) / (ΣC ABBA + ΣC ABBA) at sliding windows of 101 neighboring loci, which sufficiently reduced errors in the calculation of D .