RESULTS

In this section, we show the experimental results obtained with our pipeline in a real case study and following the same procedure that would have been carried out in a manual analysis.

Micro-Primers’ Output

The execution of Micro-Primers pipeline produces a single output file in plain text with useful information for the amplification of the SSR loci through its representative. Figure 2 shows a sample of file and how it is divided. It has eleven columns and each line represents the primers designed by Primer3 for each SSR recovered from the multi-individual sample. From left to right, the first column (in red) is the representative sequence of each cluster preceded by a unique index to easily identify them (sequence ID). Lines with same sequence ID represent different primer pairs for the same SSR loci. The second column has the size (or length) of the sequence resultant from PCR amplification using the respective primer pairs. The third and fourth columns are the forward primer sequence and its melting temperature. The fifth and sixth column are the equivalent but for the reverse primer. In the seventh column, the specific motif/allele found is shown with the number of repeats found in the SSR representative. Column marked as ’Range’ shows the length rage for the PCR amplicon for all the alleles detected for the same SSR. Nineth column contains the total number of alleles for the specific SSR loci. Next column, the tenth, indicates the potential number of alleles to be found in the population estimated from the difference between the longest and shortest alleles found. Finally, the eleventh column indicates the best combination of primer pairs for each loci (coded as ” | BEST | ”) as provided by Primer3.

Performance Analysis

The Micro-Primers pipeline was tested with a dataset belonging to bats from two different populations, Namibia and Botswana, with a total of 15 and 21 individuals respectively (the dataset is also available in the GitHub repository together with the Micro-Primers’ software). Samples were pooled, enriched for di- and tetra-repeat motifs separately following the protocol established by Garrett et al. (2017) and sequenced together on an Illumina MiSeq v2 kit (250 cycles, paired-end) targeting 300k reads.
Since the species is diploid, the maximum number of alleles to be found by locus is 72. The process was reproduced by both manually running each required program one after the other and executing the Micro-Primers pipeline with exactly the same parameters used in the manual run.
Results from both procedures were identical, as expected, being the only difference the time spent to complete them. The manual process took no less than 24 hours, mainly spent on the manual selection of the clusters. Some changes in input format were also required for the proper functioning of certain programs, such as Primer3 for which sequence identifiers were modified to include an index in the beginning to facilitate handling. The goal was to avoid problems with some software on dealing with long and redundant sequence names. On the other hand, the automatic pipeline took less than 2 minutes to execute the entire analysis, using a single core of an Intel i7 Octa-Core processor with 64 Gbytes of main memory. It should be noted that the unique point where the memory is more demanding is at the trimming step carried out by the Trimmomatic component, so in general, minimal resources are needed.
In addition, four different parameters configurations were tested to check the performance of the pipeline and evaluate the differences in the number of microsatellites loci detected. The pipeline’s execution was modified by changing the parameters at the configuration file or at the Primer3 settings file, and the number of sequences remaining after each step is presented in Table 1. The four configurations tested were: (i) the default; (ii) with activation of the special search with a minimal difference between extreme alleles of 8; (iii) with change of the flanking region length; and (iv) modifying the difference in melting temperature between forward and reverse primers at Primer3. As observed in Table 1, the numbers of sequences that comply the requirements in the first four pipeline steps are exactly the same since none of the tested configurations are applied in these levels. The pipeline output changes after Filter 2 depending on the configuration used.
The implementation of the special feature MIN ALLEL SPECIAL DIF, based on the potential number of alleles per loci, shows substantial impact on the final number of loci kept and subsequent number of primers selected in comparison with the default setting based on the observed number of alleles. When the special parameter is activated and the minimal difference between the extreme alleles is 8, the number of SSR loci increases from 26 to 104, producing a total of 83 primer pairs.
Variations on the minimal flanking region length at Filter 2 affect the number of sequences that will pass to the following steps, and thus the number of SSR markers at the end. Higher values in the flanking region parameter make the filter more restrictive, and less sequences are kept. There should be a compromise between the length of the flanking region and the capacity of Primer3 to design primers considering the parameter settings given. The shorter the flanking regions are the more sequences will pass through the Filter 2, although most will not be processed by Primer3 since they will not have enough length for the primers to be designed without overlapping with the microsatellite region.
At the end of the pipeline, changes in the maximum difference of melting temperature between primers in Primer3 (MAX DIFF TM) induces variation in the number of primer pairs designed as expected. Higher values in this parameter increase the capacity of Primer3 to find primer pairs in a sequence but, contrary to expectations, they may not necessarily be the most suitable and therefore less primers are selected at the end.