Results

Comparison with MSA-based classifiers

First, we directly compared ProteinUnetLM with its previous version ProteinUnet2 based on multiple sequence alignment (MSA) features (https://codeocean.com/capsule/0425426), and its competitor SPOT-1D network 20(https://sparks-lab.org/server/spot-1d/). ProteinUnetLM achieved the best results in 7 out of 8 combinations of the test set (TEST2016 and TEST2018) and metric (macro-AGM at residue and sequence levels, SOV8, and Q8 at the residue level) presented in Table 1. In terms of statistical significance, ProteinUnetLM had statistically significantly better macro-AGM in comparison to ProteinUnet2 (p=0.0054 on TEST2016) and SPOT-1D (p=2e-5 on TEST2016 and p=0.0077 on TEST2018) with small effect sizes (d < 0.2). There were no large or significant differences between networks for the SOV8 metric. ProteinUnetLM achieved the highest Q8 among all networks.
The separate AGM scores for each SS8 in Supplementary Table S2 show that ProteinUnetLM is better than ProteinUnet2 for all structures on the biggest TEST2016 and TEST2018 datasets, especially on the rare structures B, G, S, and I. Importantly, ProteinUnetLM achieved correct predictions for the rarest structure “I” which was not possible using MSA features in ProteinUnet2. It confirms that LMs provide better features for protein SS than MSA-based methods like PSSM or HHblits. Especially, taking into account the fact that ProteinUnetLM is a single model, not an ensemble of 10 models like ProteinUnet2.
Table 1 . The comparison of macro-AGM at thesequence and residue level , SOV8 at thesequence level, and Q8 at the residue levelon two test sets for ProteinUnetLM vs ProteinUnet2 and SPOT-1D. The best results for each dataset are boldfaced. The green shading of sequence level scores denotes the statistical significance that ProteinUnetLM has a better mean with standard deviations (SD), p-values, and Cohen’s effect size (d) given below the score.