2.3 Modeling framework and evaluation
To ensure the accuracy of sample points and avoid oversampled or biased
sampling, we used “Spatially Rarefy Occurrence Data for SDMs” tool to
choose the locations (Fourcade et al.,
2014; Brown et al., 2017). Eighty-three
fossil sites and 116 recent macaque distribution locations were selected
for this study (Figure S1 in the Appendix).
A procedure selecting the variables that are independent but closely
related was completed byPrincipal Component
Analysis (PCA), referring to the scores on the first three axes
accounting for a significant part of the eigenvalue. TheKaiser-Meyer-Olkin (KMO) and Bartlett’s tests were applied to
define whether a variable is suitable for PCA
(Toll and Van Luit, 2013). The modeling
was performed after the variables with low scores, among the highly
correlated variables, had been removed on the axes with higher loading
values.
Two different models corresponding to variable types (BC ,LU , and HP – Table S1 in Appendix) were established to
conceive macaques’ suitable habitat distribution, referring to
alternative climatic and environmental exponents and human population
size – which have shaped and would drive their geographic distribution
trajectories in the years to come.
BC models include a ) retrospectively reconstructing
suitable habitat areas for LIG and LGM periods separately;
BC- LU- HP models include b ) the suitable habitat
distribution scenario between 1970 and 2000; and c ) the future
suitable habitats distribution scenarios in the 2050s.
Five models for four periods were analyzed with the MaxEnt model, with
two main modifiable parameters - the Feature Class (FC) and theRegularization Multiplier (RM) - which can increase or decrease
the model’s fit. Since the default combinations of these two parameters
cause overfitting (Porfirio et al., 2014;
Qiao et al., 2015), we used the R package
ENMeval (Muscarella et al., 2014) to
select an optimal combination of FC and RM. They were repeated ten times
to generate the models’ operating characteristic (ROC) curves and obtain
the mean area under the curve (AUC). They are then used to assess model
accuracy according to AUC values ranging from 0 to 1. A value of 0.5
represents a random model (Myerson et al.,
2001). The point on the ROC curve, the tangent slope, equals 1,
corresponding to the maximized sum of sensitivity and specificity
(maxSSS) (Cantor et al., 1999). Compared
with other methods, maxSSS, used as the threshold, has higher
sensitivity and credibility (Liu et al.,
2016). Thus, we applied a point where the habitat is considered
suitable, using the average of maxSSS for each model. Then we use the
natural breaks (Jenks) method to classify appropriate regions into three
grades – high, moderate, and lower suitable areas
(Calka, 2018).
We also used the contribution percentage to find the dominant drivers
influencing macaques’ distribution
(Phillips et al., 2006;
Zhang et al., 2019a).
As for the geo-ecological regions in mainland East Asia, five in number
– Northwest, Southwest, Central, Coastal, and Northeast China
(Huang et al., 2021;
Zhang et al., 2022).