Genotyping and population assignment
A total of 3253 adult individuals captured and sampled for blood during the period 1998 – 2013 were successfully genotyped with our custom house sparrow Affymetrix Axiom 200,000 SNP array (Lundregan et al., 2018). Based on the MonoHigh and PolyHigh quality criteria of Affymetrix, 185,587 SNPs were passed on to further quality control, where potential duplicates (identity by state above 0.98) and low quality samples (genotyping rate < 0.90) were removed from the data set. Moreover, loci with potentially high level of genotyping errors (SNP call rate < 95%; Mendelian error rate based on parental relationships > 5%) or low minor allele frequency (MAF < 0.01) were also excluded. In total, 3116 individuals and 183,145 SNPs passed the overall quality check (Lundregan et al., 2018). In this data set, any missing genotypes (0.76% of the in total 570,679,820 genotypes) were imputed using linkimpute (Money et al., 2015) to improve statistical power in our GWAS. Finally, a metapopulation-level pedigree was constructed based on parentage analyses using individual high-density SNP-genotype data (Niskanen et al., 2020). Both parents were known for 52.7% of the individuals in the pedigree, one parent was known for 25.0% of the individuals, and the rest of the individuals did not have any parental information in the pedigree.
High-quality information on natal dispersal was available for 2741 adult birds present on one of the eight main study islands during the years 1998 – 2013 either from mark-recapture or genetic assignment information (Saatoglu et al., 2021). For the remaining 375 individuals that were successfully genotyped, we had information on which island they were first recorded (either as a fledged juvenile in the autumn or a 1 year old recruit during summer). These individuals, as well as individuals which had a natal island not among one of our eight main study islands (N = 98), and individuals which could only be assigned to a group of natal islands and not a specific one out of our 8 main study islands because the SNP genotyping of birds from the farm and non-farm islands had been initiated in different years (N = 41; see Saatoglu et al., 2021), were removed from the phenotype data set. Thus, phenotypic data on dispersal for a total of 2602 individuals were used in the animal model analyses and GWAS.