In Table 3 , for the T2 sequence, BFC with either scaled or
whitening outperforms the baselines. Besides, BFC with whitening
achieves best AUCs of 0.91 and 0.80 on P-x and LC-A, respectively.
However, these findings are not consistent with the results in ADC and
hDWI. In terms of ADC, the models preprocessed with BFC or NF
underperform the baselines. Instead, the baseline models receive the
highest AUCs, where scaled alone and whitening alone accomplish 0.73 and
0.72 on P-x and LC-A, respectively. When it comes to the sequence of
hDWI, either BFC or NF attributes limited improvement over the
baselines. On P-x, the AUC increases marginally from 0.73 (scaled only)
to 0.80 (scaled with NF); on LC-A, only an AUC of 0.65 is achieved using
scaled with BFC. The above results of the three sequences show that
these pre-processing approaches could improve CM-Net’s classification
performance when combing our two datasets. However, none of the methods
is capable of boosting the joint models’ generalization considerably, as
compared with the separate models of P-x and LC-A (in Table 2 ).
This indicates that the preprocessing methods are probably insufficient
to solve domain shift fundamentally.
A possible reason is that the
severe discrepancies do not come from the inter-site discrepancies (inTable 1 ), rather than the intensity distribution of the
heterogeneous mpMRI sequences only (see details in SupplementaryFigure 2 ).
2.3. Cross-domain Malignancy Classification and Lesion Detection
We emphasize the importance of knowledge transfer from a large-scale
publicly dataset to a small-scale target domain. The malignancy
estimation performance of CMD²A-Net (the architecture is shown inFigure 4 and described in detail in the Methods section) is
evaluated. Dataset, P-x, is only regarded as the source domain. Either
LC-A or LC-B is also set as the source domain for knowledge transfer
between local cohorts. The scaled method was employed for image
preprocessing. In general, available types of MR sequences may vary in
healthcare institutions. Thus, we employed ensemble learning to handle
multiple sequences, allowing the use of single and multiple sequence(s)
in our framework. Three common metrics were adopted for classification
performance evaluation, i.e. AUC, sensitivity (SEN), and specificity
(SPE).
Table 4. Malignancy classification results in the target
domains in four combinations of source-target domain.