2.7 | Local ancestry and genome-wide patterns of
introgression
Haplotypes were estimated from SNP genotypes at ddRAD loci in each
chromosome through imputation and phasing using Beagle 5.2 (Browning et
al. 2018; Browning et al. 2021). Sampled trees in both sites were
divided into Qc reference individuals (the Qd ancestryS = 0), admixed individuals (0 < S <
1), and Qd reference individuals (S = 1). We converted a
haplotype matrix in each of three vcf files of the Qc reference
individuals, the Qd reference individuals, and the admixed
individuals into a Numpy array using the function vcf2npy. Local
ancestry at ddRAD loci of the admixed individuals was described as
homozygote of alleles derived from either Qc or Qdancestral population and heterozygote of alleles derived from bothQc and Qd ancestral populations. The three ancestry types
were inferred from the haplotype arrays using the Python package Loter
with the functions lc.loter_smooth and loter.locanc.local_ancestry
(Dias-Alves et al. 2018). The package assumed that haplotypes of admixed
individuals originated from hybridization and recombination of ancestral
populations of reference individuals (Dias-Alves et al. 2018). To
visualize genome-wide patterns of introgression, we calculated the mean
number of alleles derived from Qd at each locus in the admixed
individuals.
To depict genome-wide introgression patterns in another method, we
estimated the Patterson’s D statistic at ddRAD loci (Martin et
al. 2015). The D statistic requires allele frequencies of four
populations with phylogenetic relationship (((P1, P2), P3), O), and
positive D values indicate introgression from P3 to P2 as
discriminating incomplete lineage sorting. We assigned the Qcreference individuals to P1, the admixed individuals to P2, theQd reference individuals to P3, and the Q. robur reference
sequence to O. We calculated the number of derived alleles that are
different from the Q. robur reference (ancestral) alleles for
individuals of P1, P3, and P3 from the vcf files and obtained the
frequency of derived alleles, p 1,p 2, and p 3, respectively,
using the package gaston 1.5.9 in R 3.3.2. Using the ABBA-BABA
statistics, C ABBA = (1 –p 1) p 2p 3 and C BABA =p 1 (1 – p 2)p 3, at each locus, we obtained D =
(ΣC ABBA – ΣC ABBA) /
(ΣC ABBA + ΣC ABBA) at
sliding windows of 101 neighboring loci, which sufficiently reduced
errors in the calculation of D .