DISCUSSION
Nowadays, it can take up months for laboratories and researchers to
analyze and select SSR loci and design primers for their projects. It
requires several analyses of MISA without any kind of pre-processing and
manual selection of sequence candidates for primer3. Hindered by the use
of a multi-individual assay, most selected SSR loci and subsequent PCR
amplifications will likely be mediocre if manual SSR selection is
applied, but also if clustering of flanking regions and the number of
putative alleles are not considered, generating markers with low
variability across populations. These technical constraints imply more
time and money, that could be easily overcome by the implementation of
an automatic tool for selecting loci, based on their polymorphism, and
designing suitable primers. This kind of tools allows users to test
distinct parameters and select the most suitable SSRs for their
experiment in a few seconds.
Specifically, there is a great advantage of implementing the automatic
Micro-Primers pipeline since there are few time-consuming steps for the
manual search that can be surpassed such as the input/output editing
between software and the categorization and allele counting for every
single SSR. First, all microsatellites individually identified need to
be grouped by their ’true’ locus, taking into account that they result
from different individuals and can be represented in multiple sequences.
After the identification of all copies of the same microsatellite loci,
the user needs to count the number of different unique alleles for the
subsequent selection. In some cases, this task can be very tedious as
some of the locus can be represented by thousands of elements.
Micro-Primers has a default configuration that has demonstrated to
achieve a good balance between the number of loci and primer designed,
however users should modify these parameters to adequate the results to
their requirements. For example, the inclusion of the SPECIALparameter, which takes into account the maximum and minimum allele found
in the SSR loci, incorporates in the decision-making the hypothesis of
new alleles not yet discovered for the species maybe due to an
insufficient sampling.
However, we advise the users not to consider the final output of
Micro-Primers as ready-to-use and instead to scrutinise them prior to
laboratorial processing. This is because, despite the imposed settings,
the output from Primer3 include sometimes bad primer sequences (with
repeated nucleotides) that should be immediately discarded by the user.
Furthermore, other sources of variation have been detected in the tests
performed, producing slight differences in the results when the process
is replicated. The impact of these biases on the final output should not
be considered as a performance deficiency of the Micro-Primers since
they are intrinsic to the individual programs used, and they may also
occur during manual processing. Specifically, in first place, since most
SSR sequences occur on the overlapping region, the software FLASH can
produce some length variation while merging the paired-end reads by not
recognising the accurate position of the repeated pattern where
sequences should be merged, thus creating ‘fake’ alleles. The second
source of variation occurs in the software CD-HIT due to the fact that
the creation of clusters based on the flanking regions is somewhat
random. The group of sequences composing each cluster will depend on the
seed sequence selected to originate each cluster. This variation can be
justified by amplification and sequencing errors. The last source of
variation is the random selection of a cluster representative for primer
design. This fact can make a cluster to be discarded if the
representative can not produce good enough primer pairs in Primer3, for
example if the product size is out of the defined range, however its
effect was found to be marginal.
Nevertheless, the use of Micro-Primers provides an unprecedented
availability of candidate SSRs at a more reliable and faster pace than
before. This strategy permits selection of markers that may be the most
suitable for specific applications or particular organisms. The creation
of an automatic pipeline is very interesting for the scientific
community since it can speed up the process and overcome the biases
associated with the manual processing, while allowing the user to test
various parameter choices by automatically running the process several
times on the same dataset.