Comparison with LM-based
classifiers
We compared our network with the three latest networks of similar
utility based on features from pLMs: NetSurfP-3.0 26,
ProtT5Sec 23, and SPOT-1D-LM 27.
SPOT-1D-LM uses features from both ProtTransT5-XL-U5023 and ESM-1b 39 LMs, NetSurfP-3.0
uses only ESM-1b with 1280 features, and ProtT5Sec only
ProtTransT5-XL-U50 with 1024 features. We run SPOT-1D-LM from its source
code (https://github.com/jas-preet/SPOT-1D-LM), and we used web
interfaces to run NetSurfP-3.0
(https://dtu.biolib.com/NetSurfP-3/) and ProtT5Sec
(https://api.bioembeddings.com/). It needs to be noted that these
networks were trained on different, but partially overlapping datasets.
ProteinUnetLM was trained on 10029 (TR10029 dataset) and validated on
983 sequences (VAL983 dataset), NetSurfP-3.0 and ProtT5Sec were trained
on 10337 and validated on 500 sequences, and SPOT-1D-LM was trained on
38913 (including most of the sequences from TR10029 and TEST2016) and
validated on 100 sequences. To ensure no overlap between the train and
test sets, we used only test sets from SPOT-1D-LM for comparisons in
this section. We attempted to train the ProteinUnetLM model using the
larger datasets from SPOT-1D-LM but surprisingly the results were
suboptimal (as presented in Supplementary Table S1), so we decided to
keep the model based on the TR10029 dataset.
The comparison of ProteinUnet2 with these three networks on 5 different
test sets is presented in Table 2. First of all, ProteinUnetLM was
statistically significantly better than NetSurfP-3.0 for all test sets
in macro-AGM and SOV8 metrics, with relatively large effect sizes (d
> 0.3). ProteinUnet2 had also much better residue level
metrics, excluding macro-AGM for TEST2018 for which NetSurfP-3.0
correctly predicted the rarest structure “I” (Supplementary Table S3).
The main advantage of ProteinUnetLM over the SPOT-1D-LM network was
better macro-AGM for all test sets, statistically significant (with a
small effect size d ≈ 0.1) for the three largest sets TEST2018,
TEST2020, and TEST2020-HQ. It comes from the fact that ProteinUnetLM
achieves much better results for rare structures B, G, and S without
losing much accuracy for the frequent ones. For the same reason,
SPOT-1D-LM had better Q8 on most of the test sets (excluding CASP12-FM),
but as mentioned in Section 2.4, this metric is not appropriate for
assessing SS8 prediction.
Table 2 . The comparison of macro-AGM and Q8at the residue level , and SOV8 at the sequence
level , on 5 test sets for ProteinUnetLM vs NetSurfP-3.0, ProtT5Sec, and
SPOT-1D-LM. The best results for each dataset are boldfaced. The green
shading of sequence level scores denotes the statistical significance
that ProteinUnetLM has a better mean with standard deviations (SD),
p-values, and Cohen’s effect size (d) given below the score.