INTRODUCTION
At the Omics’ era the cost of sequencing and time required for getting
useful information from different organisms, even uncultured, has been
reduced drastically with the advances in technology (Ekblom & Galindo,
2011), allowing the broadening of the scientific scope worldwide. While
traditional studies covered a gene region and/or a pathway with limited
number of genes, next generation sequencing (NGS) has pushed the trend
towards whole-genome analysis and population genetics, where the genome
of several individuals of a species can be characterized at the same
time (Bahassi & Stambrook, 2014). In this field, molecular techniques
such as genotyping by sequencing (GBS) and marker-assisted selection
(MAS) have gained prominence by not requiring a reference genome
available (Collard & Mackill, 2008; He et al., 2014) and also by the
possibility of characterizing a whole species at lower cost, providing
comprehensive information for both evolution studies and conservation
efforts (Khaing et al., 2013; Siadjeu, Mayland-Quellhorst, & Albach,
2018).
Genetic polymorphisms, such as single nucleotide polymorphisms (SNPs) or
simple sequence repeats (SSRs), also known as microsatellites, have
served the field of population genetics (Bruford & Wayne, 1993; Helyar
et al., 2011). SSRs are repeated DNA motifs that occur in non-coding
regions, evenly distributed throughout the genome. They are excellent
markers for genotype identification, genetic diversity and
genetic-phenotype mapping, at both species and population levels, due to
their high levels of polymorphism (Morgante & Olivieri, 1993; Vieira,
Santini, Diniz, & Munhoz, 2016). Traditional methods use a single
individual per species for microsatellite library development, and the
number of microsatellite loci genotyped afterwards needs to be limited
as a balance between the cost associated with microsatellite design and
optimisation.
The goal of this work is to design and implement an automated pipeline
for screening SSRs from raw paired-end reads generated by the
hybridization of a single-digest library enriched for di-, tri- and/or
tetranucleotide motifs (adapted from Garrett, Dawson, Horsburgh and
Reynolds (2017)) but using a multi-individual sample. The tool is able
to detect the SSRs loci variation present in the population and design
optimal primers per SSR marker. Highly polymorphic markers can be then
used to genotype other individuals from the same species.
Currently, there are few programs that can detect microsatellites and
that are able to design primers for later amplification such as SSR
Pipeline (Miller, Knaus, Mullins, & Haig, 2013), GMATA (Wang & Wang,
2016), Full SSR (Metz, Cabrera, Rueda, Giri, & Amavet, 2016) and
CandiSSR (Xia et al., 2016). However, to the best of our knowledge, none
of them takes in consideration several important parameters for a
multi-individual SSR identification. Although some of them can extract
SSRs from sequence data and design primers (e.g., Full SSR and GMATA),
they do not consider if, for example, an allele belongs to an existing
SSR locus in the dataset. Micro-Primers (available at the GitHub
repository https://github.com/FilAlves/micro-primers) integrates a set
of external programs into an automated pipeline allowing the perfect
communication between them by conditioning the input/output formats.
Thus, Micro-Primers represents a unique and easy framework for
microsatellite characterization.