Optimising Sentinel-2 Feature Space for Improved Crop Biophysical and
Biochemical Variables Retrieval Using the Novel Spectral Triad Feature
Selection Algorithm
Abstract
Machine learning regression algorithms (MLRAs) can learn complex and
non-linear relationships between the response and predictor variables.
However, studies have shown that feature subset selection is more
beneficial, yielding high accuracy and low uncertainties in retrieving
biophysical and biochemical variables. Generally, feature subset
selection techniques are often applied with highly dimensional and
correlated hyperspectral data, while it is seldom used with the
multispectral dataset. Instead, previous studies utilising multispectral
data have mainly applied the entire feature space. The advent of
quasi-hyperspectral sensors, e.g., Sentinel-2, presents new challenges
where two or more variables may be collinear and impact MLRA’s
performance. This study presents a novel Spectral Triad feature
selection technique based on music theory and compares it to the entire
MSI feature space and Random Forest-Recursive Feature Elimination
(RF-RFE). The optimal subsets were evaluated with Random Forest for
retrieving leaf area index (LAI), Leaf Chlorophyll a + b
(LCab) and Canopy Chlorophyll Content (CCC) in a semi-arid
agricultural landscape. The results indicated that the proposed STfs
algorithm obtained equivalent or better (i.e., by 1 – 3%) retrieval
results for LAI (R2cv of 66%, RMSEcv of 0.53 m2 m–2),
LCab (R2cv: 74%, RMSEcv: 7.09 µg cm–2) and CCC
(R2cv: 77%, RMSEcv: 33.69 µg cm–2), using only 5, 7 and 7
variables, respectively, when compared to RF-RFE and entire MSI feature
space. Overall, the proposed STfs algorithm has great potential to
optimise the spectral feature space of quasi-hyperspectral sensors for
rapid crop biophysical and biochemical parameter retrieval.