INTRODUCTION

At the Omics’ era the cost of sequencing and time required for getting useful information from different organisms, even uncultured, has been reduced drastically with the advances in technology (Ekblom & Galindo, 2011), allowing the broadening of the scientific scope worldwide. While traditional studies covered a gene region and/or a pathway with limited number of genes, next generation sequencing (NGS) has pushed the trend towards whole-genome analysis and population genetics, where the genome of several individuals of a species can be characterized at the same time (Bahassi & Stambrook, 2014). In this field, molecular techniques such as genotyping by sequencing (GBS) and marker-assisted selection (MAS) have gained prominence by not requiring a reference genome available (Collard & Mackill, 2008; He et al., 2014) and also by the possibility of characterizing a whole species at lower cost, providing comprehensive information for both evolution studies and conservation efforts (Khaing et al., 2013; Siadjeu, Mayland-Quellhorst, & Albach, 2018).
Genetic polymorphisms, such as single nucleotide polymorphisms (SNPs) or simple sequence repeats (SSRs), also known as microsatellites, have served the field of population genetics (Bruford & Wayne, 1993; Helyar et al., 2011). SSRs are repeated DNA motifs that occur in non-coding regions, evenly distributed throughout the genome. They are excellent markers for genotype identification, genetic diversity and genetic-phenotype mapping, at both species and population levels, due to their high levels of polymorphism (Morgante & Olivieri, 1993; Vieira, Santini, Diniz, & Munhoz, 2016). Traditional methods use a single individual per species for microsatellite library development, and the number of microsatellite loci genotyped afterwards needs to be limited as a balance between the cost associated with microsatellite design and optimisation.
The goal of this work is to design and implement an automated pipeline for screening SSRs from raw paired-end reads generated by the hybridization of a single-digest library enriched for di-, tri- and/or tetranucleotide motifs (adapted from Garrett, Dawson, Horsburgh and Reynolds (2017)) but using a multi-individual sample. The tool is able to detect the SSRs loci variation present in the population and design optimal primers per SSR marker. Highly polymorphic markers can be then used to genotype other individuals from the same species.
Currently, there are few programs that can detect microsatellites and that are able to design primers for later amplification such as SSR Pipeline (Miller, Knaus, Mullins, & Haig, 2013), GMATA (Wang & Wang, 2016), Full SSR (Metz, Cabrera, Rueda, Giri, & Amavet, 2016) and CandiSSR (Xia et al., 2016). However, to the best of our knowledge, none of them takes in consideration several important parameters for a multi-individual SSR identification. Although some of them can extract SSRs from sequence data and design primers (e.g., Full SSR and GMATA), they do not consider if, for example, an allele belongs to an existing SSR locus in the dataset. Micro-Primers (available at the GitHub repository https://github.com/FilAlves/micro-primers) integrates a set of external programs into an automated pipeline allowing the perfect communication between them by conditioning the input/output formats. Thus, Micro-Primers represents a unique and easy framework for microsatellite characterization.