3.3.1 Analysis of chameleon sequences
To understand the differences between the networks on the CASP14 dataset, we applied the analysis of predictions for chameleon sequences (ChSeqs) – specific sequences of amino acids that are known to adopt different 3-class SS (H, E, C) in different unrelated proteins. This analysis is considered one of the most rigorous tests for SS predictors because the conformations of ChSeqs depend on non-local protein-specific interactions 54,55. We searched CASP14 for all 4-element ChSeqs according to the database by 56 and created a CASP14-ChSeqs set containing 3202 such 4-element sequences and their associated SS (for the first element in the sequence) according to DSSP. In Supplementary Figure S2, we compared the numbers and types of mistakes made for CASP14-ChSeqs by all the networks. The largest number of mistakes and largest differences between networks were observed for the loop class. ProteinUnetLM mistook helix for coil (H → C) over 2x less often than ProteinUnet2, reaching a level similar to AlphaFold2. The biggest disadvantage of AlphaFold2 was the overprediction of helices instead of coils (with the highest number of C → H errors out of all networks) which is in line with the conclusions from30. MSA-based networks (ProteinUnet2 and SPOT-1D) made over 80 mistakes more than their LM-based counterparts which confirms the higher predictive power of LM features for challenging chameleon sequences. ProteinUnetLM achieved 3rd best result after AlphaFold2 and SPOT-1D-LM, beating NetSurfP-3.0.

Running time comparison

For a comparison of running times of LM-based models, we used a laptop with Nvidia RTX 2080 Max-Q GPU and Intel i7-10750H CPU. In the prediction time, we take into account the time needed for feature generation and the time of inference of the networks for SS prediction only (i.e., excluding regression-based networks for generating other outputs of SPOT-1D-LM) using batch size 1. We do not take into account the time needed for program initialization, data loading, or saving the results on disk. We were unable to measure the inference time of NetSurfP-3.0 on the same computer, as the model is accessible only for online end-to-end prediction. However, we assumed that the inference time of NetSurfP-3.0 is 5.3x shorter than for SPOT-1D-LM, based on the information from article 26, this assumption was marked with an asterisk in Table 4 which presents the times.
The features calculation time for ProtTransT5-XL-U50 (ProteinUnetLM) and ESM-1b (NetSurfP-3.0) on GPU is similar, with ESM-1b being 1.5x faster on CPU. The features calculation time for SPOT-1D-LM is a sum of both which makes it around 2x longer. ProteinUnetLM has nearly 3x shorter inference time on CPU (3 s) than on GPU (8 s). This is because ProteinUnetLM is so lightweight that loading features from pLMs (1024 x 704 values) into GPU and retrieving the result takes longer than simply running the model on the CPU. It leads to the situation where the optimal approach is to generate features using GPU and to run inference on CPU. It makes the inference time around 7x shorter than for SPOT-1D-LM on GPU and around 66x shorter on CPU. It results in 38 s (152 ms per sequence) of prediction time which is on par with the estimated prediction time of NetSurfP-3.0 on GPU and 2.4 times shorter than SPOT-1D-LM on GPU. Additionally, ProteinUnetLM can be effectively used without GPU with a prediction time shorter than 3 s per sequence. It is worth adding that ProteinUnetLM can be additionally sped up without losing much accuracy by training without AA on input (Supplementary Table S1) if necessary
Table 4 . The comparison of running times for ProteinUnetLM, SPOT-1D-LM, and NetSurfP-3.0 on the TEST2018 set with 250 sequences.