MATERIALS AND METHODS

The main motivation of Micro-Primers is to eliminate issues regarding time and computational work, while doing a manual selection of microsatellites candidates. As such, several tools and scripts were integrated within Micro-Primers for discovering SSRs and designing the respective primers for further in vitro amplification.

Internal & External Components

The Micro-Primers pipeline was written in Python version 3.6. The two main internal components were implemented using the following scripts: (i) install.py that includes all necessary pre-requisites for a proper installation of Micro-Primers, and (ii) micro-primers.py, which is the main script that defined the pipeline. Analysis settings are described in the config.txt file, and parameters can be modified by the user accordingly to their own needs. The folder software, provided together with the Python scripts, holds all the scripts and external software employed by micro-primers.py.
The Micro-Primers pipeline integrates several external components, such as: (i) Trimmomatic (Bolger, Lohse, & Usadel, 2014) for the removal of the sequencing adapters; (ii) Cutadapt (Martin, 2011) for the removal of the technology-specific adapter; (iii) FLASH (Magoč & Salzberg, 2011) for the merging of paired-end reads (R1 and R2); (iv) MISA (Thiel, Michalek, Varshney, & Graner, 2003) for the SSR searching; (v) CD-HIT (Fu, Niu, Zhu, Wu, & Li, 2012; Li & Godzik, 2006) for the removal of redundancy; (vi) Primer3 (Rozen & Skaletsky, 2000) for the primer design.

Input Files & Pipeline

To run Micro-Primers, users only need to provide two FASTQ files corresponding to both ends of a paired-end sequencing run. Samples should come from a pool of (untagged) individuals of the same species so the microsatellite selection can be optimized. SSR selection will be performed based on the number of alleles of each SSR loci, so the more heterogeneous the sample is (i.e. containing individuals from distinct populations across the species distribution), the better the final result will be. Reads must come from a microsatellite library built using a restriction enzyme and following an enrichment protocol such as the one described in Garrett et al. (2017). The enrichment protocol is performed after digestion so the target SSR motifs are the most represented strands in the final library. A fragment size selection is then performed on the enriched library to keep only fragments of an average length lower than the maximum sequencing length to allow both paired-ends reads overlap when merged later on. The final fragment size is important for microsatellite screening and must comprise the full SSR pattern (variable in length) and the two flanking regions with fair length for primer design.
Additionally, prior to the execution of Micro-Primers, the users must install all the external components and set the environment variables through the script install.py. Moreover, the users must also check the config.txt file (we will describe the configuration parameters in the next subsection) and then, they can execute the main script (micro-primers.py). Up on execution, Micro-Primers will follow the flowchart described in Figure 1. It begins using Trimmomatic and Cutadapt for the removal of sequencing and technology-specific adapters respectively, and both paired-end reads are merged via FLASH. Only sequence reads containing the restriction enzyme pattern are kept by the pipeline. Various parameters are then calculated and only the sequences that comply with the specifications of the users are selected. Next, the repeating region of SSRs is removed from sequences, and the flanking regions are aligned and assigned to a cluster using CD-HIT with the following parameters (-c=0.95 -n=10).
Sequences belonging to the same cluster are sorted and number of different alleles in the cluster are computed. Only clusters with a minimal number of alleles (set by the user at the config.txt file) are chosen and a random sequence among variants is selected as the representative of each SSR locus. Every representative is then parsed into primer3 and an output file with both, primer information and number of alleles for each sequence, will be created accordingly to the primer’s specifications set by the user.

Execution Parameters

As described previously, all the parameters that Micro-Primers needs to perform the analysis properly, must be dully set at the config.txt file. In this file there are four sections with different parameters to be considered for the pipeline execution:
In the first section (Input Files), the user has to indicate the name of the paired-end files that will be used as input in the analysis.
In the second section (CUTADAPT), the sequence of adapters used after the restriction enzyme digestion is required. These adapters are necessary to transform the longer overhangs into blunt ends after the enzyme digestion. Only sequences with these adapters are considered ‘true’ digested sequences. In the third section (SSR), several parameters regarding the microsatellite selection are involved. The parameter MIN FLANK LEN indicates the minimum length accepted in both flanking regions where the primers will be designed on. The length of the flanking areas is critical to the final outcome since a very narrow window prevents the design of primers and subsequently causes the exclusion of the respective SSR. Thus, any sequence with shorter flanking region (in any of both ends) that the length specified will be discarded. The MIN MOTIF REP sets the minimum number of repeats that every SSR loci must have to be kept in the pipeline. Also, specific SSR motifs can be discarded from the output if indicated in the EXC MOTIF TYPE parameter. Options for this parameter are c (compound), c* (compound with imperfection) and p1 to p6 (repeated motif of 1 to 6 nucleotides) (Thiel et al., 2003). Motifs chosen to be discarded should be indicated separately with comma. The MIN ALLEL CNT option indicates the minimum number of alleles for a SSR locus to be selected and it is based on the observed alleles. In opposition, the parameter MIN ALLEL SPECIAL DIF indicates the minimum potential number of alleles desired for each loci, taking into account that not all alleles are represented in the multi-individual sample. Assuming that the difference between the alleles with higher and lower number of repeats, only loci that satisfy the minimum number of alleles indicated in the MIN ALLEL SPECIAL DIF are kept. The parameter MIN ALLEL SPECIAL is used to enable
(=1) or disable (=0) this option.
Finally, in the fourth section (PRIMER3), the config.txt file is used to implement PRIMER3, where the only requirement is to indicate the path to the Primer3 settings file containing the standard parameters of Primer3. However, the parameters can be changed according with a user demands, e.g., PRIMER PRODUCT SIZE RANGE, PRIMER OPT SIZE, PRIMER OPT TM and/or PRIMER MAX POLY X, among others (find all parameters in https://primer3.org/manual.html). PCR amplification primers usually are designed with a length of 20-25 nucleotides and some particularities are required to avoid future problems during genotyping (Dieffenbach, Lowe, & Dveksler, 1993; Flores-Rentería & Krohn, 2013), like the presence of G or C at the 3’ end, certain percentage of GC for a proper melting temperature and both primers having similar melting temperature for their hybridization to take place at the same time.