MCC was already evaluated as one of the most reliable, universal, and
informative metrics in machine learning and bioinformatics problems40–42. We involved MCC in the training loss to
address the imbalance problem of the protein SS prediction and improve
the results on rare structures. The ablation study in Supplementary
Table S1 suggests that it was successfully achieved by improving metrics
for TEST2018.
We used an Adam optimizer 43 with batch size 8 and an
initial learning rate of 0.001. The learning rate was reduced by a
factor of 0.1 when there was no improvement in the validation loss for 4
epochs. The training was stopped when the validation loss was not
improving for 6 epochs and
the checkpoint with the lowest
validation loss among all epochs was selected as the final ProteinUnetLM
model.
ProteinUnetLM was implemented in the environment containing Python 3.8
with TensorFlow 2.9 accelerated by CUDA 11.7 and cuDNN 8. The inference
code and trained models are available on the CodeOcean platform
(https://codeocean.com/capsule/7112101) ensuring high
reproducibility of the results. An easy-to-use web interface is
accessible on Biolib (https://biolib.com/SUT/ProteinUnetLM/). The
code for training can be run in the Google Colab notebook
(https://colab.research.google.com/drive/1Onh6xlg-a-_QDy2EL_t9XmKa8T3VLVEv).
Metrics and statistical
testing
Following the reasoning from the ProteinUnet2 paper, we utilize the
Adjusted Geometric Mean (AGM) metric as a primary metric for assessing
the prediction performance. It is well-suited for bioinformatics
imbalance problems, it performs better than F-score in these problems,
and it has no parameters (like a beta in F-score) 44.
It is given by Equation 4 where GM is the geometric mean (Equation 5)
and Nn is the proportion of negative samples. It takes
value from range <0, 1> where 1 is a perfect
prediction. The metric can be calculated both at the residue and at the
sequence level. By the residue level , we mean calculating the
metric once for all residues in all sequences in the dataset, and bythe sequence level , we mean calculating the metric separately for
each sequence in the dataset and taking an average out of scores. To
aggregate the metric across 8 classes, we use macro averaging – we
calculate the AGM score separately for each class and average the
results to create the macro-AGM score.