Construction and evaluation of orchid SDMs
For comparison, we chose three build model strategies, contained based
on the regression algorithm (Generalized Linear Model, GLM), the
classification algorithm (Random Forest model, RF), and the machine
learning (MaxEnt models). Avoiding the uncertainty of our models caused
by sampling deviation possibly, we randomly generated 2000 pseudo
absence occurrences for every dataset (all-data, t-data, e-data, and
m-data) and repeated them three times. Before running the models, we
adjusted their parameters: the tree of RF set 1000, the type of GLM set
as quadratic, and its interaction level set as 1, and chose the MAXENT.
Phillips. 2 default. Whole datasets divided their 70% into training
data, the rest as testing data, and run five times respectively. All of
the above operations were realized by the biomod2 package in R (4.2.1).
To distinguish clearly, we used the strategies combined with data sets
to name each model. For example, G-all represented an orchid
distribution model built using GLM for all-data.
We employed three indicators to evaluate the performance of our models:
the area under the receiver operating character curve (AUC), the Kappa
value, and the truth technique statistics (TSS). The AUC represents the
probability, for a randomly selected observation, that the correct
classification of the model is higher than the incorrect. Its value
range is [0, 1], and the closer the value is to 1, the better the
model will be. The Kappa coefficient means a ratio of the number of
observation points correctly predicted to the incorrectly predicted. The
TSS is an improved test index based on the Kappa coefficient. Both range
from -1 to 1. When the value exceeds 0.4, the model has a bright
prediction. Additionally, for exploring the HI factor effect in models,
we rerun all options under the condition without it.