DISCUSSION

Nowadays, it can take up months for laboratories and researchers to analyze and select SSR loci and design primers for their projects. It requires several analyses of MISA without any kind of pre-processing and manual selection of sequence candidates for primer3. Hindered by the use of a multi-individual assay, most selected SSR loci and subsequent PCR amplifications will likely be mediocre if manual SSR selection is applied, but also if clustering of flanking regions and the number of putative alleles are not considered, generating markers with low variability across populations. These technical constraints imply more time and money, that could be easily overcome by the implementation of an automatic tool for selecting loci, based on their polymorphism, and designing suitable primers. This kind of tools allows users to test distinct parameters and select the most suitable SSRs for their experiment in a few seconds.
Specifically, there is a great advantage of implementing the automatic Micro-Primers pipeline since there are few time-consuming steps for the manual search that can be surpassed such as the input/output editing between software and the categorization and allele counting for every single SSR. First, all microsatellites individually identified need to be grouped by their ’true’ locus, taking into account that they result from different individuals and can be represented in multiple sequences. After the identification of all copies of the same microsatellite loci, the user needs to count the number of different unique alleles for the subsequent selection. In some cases, this task can be very tedious as some of the locus can be represented by thousands of elements. Micro-Primers has a default configuration that has demonstrated to achieve a good balance between the number of loci and primer designed, however users should modify these parameters to adequate the results to their requirements. For example, the inclusion of the SPECIALparameter, which takes into account the maximum and minimum allele found in the SSR loci, incorporates in the decision-making the hypothesis of new alleles not yet discovered for the species maybe due to an insufficient sampling.
However, we advise the users not to consider the final output of Micro-Primers as ready-to-use and instead to scrutinise them prior to laboratorial processing. This is because, despite the imposed settings, the output from Primer3 include sometimes bad primer sequences (with repeated nucleotides) that should be immediately discarded by the user. Furthermore, other sources of variation have been detected in the tests performed, producing slight differences in the results when the process is replicated. The impact of these biases on the final output should not be considered as a performance deficiency of the Micro-Primers since they are intrinsic to the individual programs used, and they may also occur during manual processing. Specifically, in first place, since most SSR sequences occur on the overlapping region, the software FLASH can produce some length variation while merging the paired-end reads by not recognising the accurate position of the repeated pattern where sequences should be merged, thus creating ‘fake’ alleles. The second source of variation occurs in the software CD-HIT due to the fact that the creation of clusters based on the flanking regions is somewhat random. The group of sequences composing each cluster will depend on the seed sequence selected to originate each cluster. This variation can be justified by amplification and sequencing errors. The last source of variation is the random selection of a cluster representative for primer design. This fact can make a cluster to be discarded if the representative can not produce good enough primer pairs in Primer3, for example if the product size is out of the defined range, however its effect was found to be marginal.
Nevertheless, the use of Micro-Primers provides an unprecedented availability of candidate SSRs at a more reliable and faster pace than before. This strategy permits selection of markers that may be the most suitable for specific applications or particular organisms. The creation of an automatic pipeline is very interesting for the scientific community since it can speed up the process and overcome the biases associated with the manual processing, while allowing the user to test various parameter choices by automatically running the process several times on the same dataset.