We also investigate effect of ensemble learning using multiple
sequences, which could provide references to choose appropriate
sequences for PLDC. In each DA setting, the models using multiple
sequences are always more effective than using any single sequence
alone. Besides, although ADC or hDWI always leads to the worst
classification results, T2 ensembled with one/both of them can
explicitly enhance the model’s performance. This finding is consistent
with the clinical practice of using mpMRI for PCa diagnosis. Sequences
ADC and hDWI are usually considered as secondary references by
radiologists. It should be noted that the all-sequence-ensembled (i.e.
ensemble of T2, ADC, and hDWI) models show significant predictions in
most DA settings. Although ensemble of the three sequences could not
lead to the best performance in the second DA setting (i.e. P-x → LC-A),
the model of the second DA setting still attains a remarkable AUC of
0.91, which is only about 1% smaller than the highest AUC (0.92). It
can be concluded that using more sequences would help multi-cohort MRI
harmonization, thus boosting the final classification performance.
Moreover, with the same target domain (i.e. either LC-A or LC-B), the
CMD²A-Net transferred from P-x attains a higher AUC than transferred
from a local cohort domain in each sequence combination. This implies
more source samples could enhance the model’s cross-domain knowledge
transferability, thus improving the model’s generalization in the target
domain. The superior performance also demonstrates CMD²A-Net’s
capability
of transferring the knowledge of a public dataset to our local cohort
domains.
Figure 2 shows coarse lesion detection results of the
accurately classified and misclassified examples. Two DA settings (i.e.
P-x to LC-A, and P-x to LC-B) were selected as representatives for
lesion detection evaluation. Results of the all-sequence-ensembled
method are selected as representative for analysis. In the correctly
classified examples, Coarse lesion contours could encircle the lesion
ground-truth point in all sequences (as shown in Figure 2a ).
However, in the unclassified examples, the coarse lesion position could
not be precisely detected in most sequences as shown in the third row.
In the example of LC-A, the lesion on T2 is correctly detected, but the
lesion contours on ADC and hDWI maps are falsely identified. The
possible reason is that the coarse lesion masks applied as the training
ground truth could not depict the actual lesion contours accurately.
Therefore, we can observe that accurate detection on ADC and hDWI also
play a role in enhancing the ensembled classification, although lesion
detection generally heavily relies on T2 images. In the future, robust
weak label processing methods (e.g., deep extreme level set evolution
method [36]) are expected to be employed. For the
example from LC-B, under-segmentation of the prostate region can be
found on the T2 image, which could lead to failure lesion detection. As
the prostate regions on ADC and hDWI were transformed using T2,
under/over-segmentation of the prostate gland on T2 would deteriorate
the lesion detection in the other two sequences. Despite the inaccurate
lesion detection on ADC and hDWI, it should be noted that the models
with multi-sequences input still outperform the models using T2 alone in
lesion classification, accrediting to the re-use of prostate features
from ADC and hDWI.
2.4. Comparisons with the State-of-the-art Methods
We compared our model with three state-of-the-art models using AUC, i.e.
Resnet50 [37], DANN [38],
and Deep Coral [25]. Dataset, P-x, was used as the
source domain. Our local cohort datasets, LC-A and LC-B, acted as the
target domains. The individual (i.e. T2, ADC, and hDWI) and the
ensembled (i.e. T2 + ADC + hDWI) sequences were involved. The other
ensembled sequences, T2 + ADC, T2 + hDWI, and ADC + hDWI were not
involved here due to their inferior performance as discussed in Section
4.2. Detailed comparison results are summarized in Table 5 .
Table 5. AUC comparisons on malignancy classification (i.e.
csPCa or non-csPCa) with the three existing models.