SNP filtering criteria suitable for kinship estimation
Although some studies have shown that using more SNPs can increase the number of identified kinships (Mendes et al., 2022), we chose to prioritize the creation of an SNP dataset with minimal errors by applying filtering conditions, even if it meant a reduction in the number of SNPs. This is because when a large number of SNPs were used under loose filtering conditions, the estimation of kinships did not yield satisfactory results when the assumed error rate was high in COLONY 2.0. We finally found what were probably the best filtering conditions (MIN_DP = 5, MAX_DP = 50, MIN_MEAN_DP = 15, MIN_GQ ≥ 30, CR > 0.9, MAF ≥ 0.03, HWE < 0.00001, HET ≤ 0.7, LD ≤ 0.2) that accurately reproduced known kinships in both COLONY 2.0 and Sequoia. The subtle adjustments of HET or HWE values, which resulted in successful kinship estimation using both COLONY 2.0 and Sequoia, likely played a crucial role in the success of our study; we used this combination to estimate unknown kinship relationships. We are confident that we have identified the conditions required to generate a set of SNPs that can produce relatively reliable results for kinship analysis in the target population.