Figure 3: Distribution of the scores shown in Table 1 and Table 2 for A) Top1-pose RMSD, B) Top1-pose lDDT-PLI, C) Top5-pose RMSD, and D) Top5-pose lDDT-PLI. The lines and the black dots in the bars represent the median and the mean respectively.
As expected, the results for the Best pocket docking are better than Blind docking, as the search space is restricted. DiffDock performs well on this set despite only offering blind docking mode, as also reported by the authors along with the suggestion to use DiffDock as a ligand-specific pocket detector34. The difference between the two scoring metrics is especially seen when comparing blind docking results of GNINA and TankBind in Table 1. The median RMSD is worse for GNINA indicating more “severe” failures which bring up the RMSD, as it is an unbounded metric. In contrast, lDDT-PLI is bounded, and all PLC poses beyond the thresholds used are assigned a score of 0, and are less penalized by very bad predictions. In addition, lDDT-PLI does not penalize parts of the ligand which are floating in areas not in contact with the protein. All the tools have a significant performance decrease when using AlphaFold models as input. This is especially striking when considering Best pocket docking, where exact side-chain and conformation positioning seem to be crucial for obtaining the right ligand pose for physics-based docking tools, as seen in Figure 4, where the backbone RMSD of the AlphaFold model is 3.56 Å and it is clear that a rearrangement has pushed a helix into the binding pocket, preventing the correct ligand pose from being found. This trend is not as striking for the deep learning tool DiffDock, as its training has less reliance on side-chain atoms, although the performance is still lower than on crystal structures.