Genome-wide SNP heterozygosity estimation and functional SNP
categorization
To assess the genetic diversity of the T. ichikawai genome, we
estimated the genome-wide SNP heterozygosity of the reference
individual. Initially, stLFR barcode-trimmed MGISEQ reads were mapped to
the reference genome using NextGenMap (Sedlazeck et al., 2013), and a
binary format alignment/map (BAM) file was generated. The BAM file was
sorted by SAMTools version 1.7
(Li, H. et al., 2009). Next,
local realignments of INDELs in the sorted BAM file were conducted by
GATK v3.8.1 (McKenna et al. 2010).
Then, a genomic variant call
format (GVCF) file of the reference individual was generated by GATK
HaplotypeCaller with options -hets 0.001 and -indelHeterozygosity 0.001.
Finally, SNPs of the reference individual were called and
an output variant call format
(VCF) file was generated using the GATK GenotypeGVCF tool. For genotyped
SNPs, variant filtering was applied using the GATK VariantFilteration
tool with cutoff values as follows: MQ > 30.00, SOR
< 4.000, QD > 2.00, FS < 60.000,
MQRankSum > –20.000, ReadPosRankSum >
–10.000, and ReadPosRankSum < 10.000.