Statistical sampling of missing environmental variables improves
biophysical genomic prediction
Abstract
Since the invention of whole genome prediction (WGP) more than two
decades ago, breeding programs have established extensive reference
populations that are cultivated under diverse environmental conditions.
The introduction of the CGM-WGP model, which integrates crop growth
models (CGM) with WGP, has expanded the applications of WGP to the
prediction of unphenotyped traits in untested environments, including
future climates. However, CGMs require multiple seasonal environmental
records, unlike WGP, which makes CGM-WGP less accurate when applied to
historical reference populations that lack crucial environmental inputs.
Here, we investigated the ability of CGM-WGP to approximate missing
environmental variables to improve prediction accuracy. Two
environmental variables in a wheat CGM, initial soil water content
(InitlSoilWCont) and initial nitrate profile, were sampled from
different normal distributions separately or jointly in each iteration
within the CGM-WGP algorithm. Our results showed that sampling
InitlSoilWCont alone gave the best results and improved the prediction
accuracy of grain number by 0.07, yield by 0.06 and protein content by
0.03. When using the sampled InitlSoilWCont values as an input for the
traditional CGM, the average narrow-sense heritability of the
genotype-specific parameters (GSPs) improved by 0.05, with GNSlope,
PreAnthRes and VernSen showing the greatest improvements. Moreover, the
root mean square of errors for grain number and yield was reduced by
about 7% for CGM and 31% for CGM-WGP when using the sampled
InitlSoilWCont values. Our results demonstrate the advantage of sampling
missing environmental variables in CGM-WGP to improve prediction
accuracy and increase the size of the reference population by enabling
the utilisation of historical data that is missing environmental
records.