SNP filtering criteria suitable for kinship estimation
Although some studies have shown that using more SNPs can increase the
number of identified kinships (Mendes et al., 2022), we chose to
prioritize the creation of an SNP dataset with minimal errors by
applying filtering conditions, even if it meant a reduction in the
number of SNPs. This is because when a large number of SNPs were used
under loose filtering conditions, the estimation of kinships did not
yield satisfactory results when the assumed error rate was high in
COLONY 2.0. We finally found what were probably the best filtering
conditions (MIN_DP = 5, MAX_DP = 50, MIN_MEAN_DP = 15, MIN_GQ ≥ 30,
CR > 0.9, MAF ≥ 0.03, HWE < 0.00001, HET ≤ 0.7,
LD ≤ 0.2) that accurately reproduced known kinships in both COLONY 2.0
and Sequoia. The subtle adjustments of HET or HWE values, which resulted
in successful kinship estimation using both COLONY 2.0 and Sequoia,
likely played a crucial role in the success of our study; we used this
combination to estimate unknown kinship relationships. We are confident
that we have identified the conditions required to generate a set of
SNPs that can produce relatively reliable results for kinship analysis
in the target population.