Conditional Biomarker Distribution Simulations for
Under-represented Groups
A conditional GAN was used to evaluate whether the GAN method could be
used to generate biomarker distributions in Black, Hispanic, Other and
White under-represented minority groups.
The number (%) of Black, Hispanic, Other and White subjects in the test
data set were 1730 (20.8%), 2228 (26.8%), 1137 (13.7%) and 3230
(38.8%); the total number of subjects was 8325.
The t-SNE projections of the GAN-generated data and the test data
distributions for the four race categories are compared in Figure 4. The
corresponding UMAP projections are summarized in Supplementary Figure 3.
The t-SNE and UMAP projections for the GAN-generated distributions were
qualitatively well-dispersed across the test data for the four
race/ethnicity groups. The box plots in Figure 5 and Supplementary
Figure 4 compare the univariate distributions of the 14 biomarkers and
demonstrate the concordance of the GAN-generated data with the test data
for each race.
Together, these results demonstrate that the GAN strategy can generate
satisfactory approximations for high dimensional biomarker joint
distributions in under-represented groups.