Construction and evaluation of orchid SDMs
For comparison, we chose three build model strategies, contained based on the regression algorithm (Generalized Linear Model, GLM), the classification algorithm (Random Forest model, RF), and the machine learning (MaxEnt models). Avoiding the uncertainty of our models caused by sampling deviation possibly, we randomly generated 2000 pseudo absence occurrences for every dataset (all-data, t-data, e-data, and m-data) and repeated them three times. Before running the models, we adjusted their parameters: the tree of RF set 1000, the type of GLM set as quadratic, and its interaction level set as 1, and chose the MAXENT. Phillips. 2 default. Whole datasets divided their 70% into training data, the rest as testing data, and run five times respectively. All of the above operations were realized by the biomod2 package in R (4.2.1). To distinguish clearly, we used the strategies combined with data sets to name each model. For example, G-all represented an orchid distribution model built using GLM for all-data.
We employed three indicators to evaluate the performance of our models: the area under the receiver operating character curve (AUC), the Kappa value, and the truth technique statistics (TSS). The AUC represents the probability, for a randomly selected observation, that the correct classification of the model is higher than the incorrect. Its value range is [0, 1], and the closer the value is to 1, the better the model will be. The Kappa coefficient means a ratio of the number of observation points correctly predicted to the incorrectly predicted. The TSS is an improved test index based on the Kappa coefficient. Both range from -1 to 1. When the value exceeds 0.4, the model has a bright prediction. Additionally, for exploring the HI factor effect in models, we rerun all options under the condition without it.