3 Results
3.1 Detection and isolation of PDCoV
There was a total of 42 positive samples for PDCoV in 314 samples, which were suspected of PDCoV, with a positive rate of 13.4%. Two PDCoV completed genome were obtained in this study, whose accessions were MH715491 and MT263013 in GenBank. Moreover, a strain of PDCoV (MT263013), which could be stably passaged on the LLC-PK cell line, was successfully isolated and confirmed by an indirect immunofluorescence assay (ELISA) and reverse transcription-polymerase chain reaction (RT-PCR) (Fig. 1).
3.2 Distribution and phylogenetic analysis of PDCoV
In 2012, PDCoV was first found in Hong Kong, China, and broke out in the United States in 2014. At present, PDCoV has been reported in more than 11 countries all over the world. In China, pig farms in 14 provinces were infected with PDCoV (Fig. 2). It has spread all over the world, showing a global trend. The analysis results from the ML tree and BI tree for PDCoV completed genome were almost identical (Fig. 3). The results indicated that the full genomes would be classified into three major lineages, which named the Southeast Asia (SEA) lineage, including Thailand, Vietnam and Laos, America lineage, including the USA, Peru, Japan, and South Korea (JSK), and China (CHN) lineage. We also observed that MW685622 and MW685624 in Ayiti were highly similar to KY065120 in Tianjin, China (99.8%) and MW685623 in Ayiti were highly similar to KR150443 in Arkansas, USA (98.9%).
Bayesian Skyline Plot analysis revealed that the estimated effective population size went up from 15 to 45 between 1989 and 2010, after a brief fluctuation, and the effective population floated in a range of around 70 up to 2019 (Fig. 4a).
According to the root-to-tip regression from TempEst (version 1.5.3), the analysis of temporal structure revealed aspects of the clock-like structure of spike gene (n = 130, correlation coefficient = 0.56, R2 = 0.32), which indicated the sufficiently strong temporal signal to estimate time-calibrated phylogenies using molecular clock models (Supplementary Figs S1). Similar to the full genome, spike genes were also classified into three major lineages by the analysis of maximum clade credibility (MCC) trees (Fig. 4b). Moreover, our reconstruction confirmed that the virus spread from the CHN lineage, however, it was interesting that SEA lineage was the origin according the BSSVS analysis. The MCC tree indicated that the probability for PDCoV originating from the CHN lineage (49.05%) and SEA lineage (48.45%) is similar. BSSVS analysis demonstrated PDCoV spread from Southeast Asia to the USA, Ayiti, Peru and China with high BF value and posterior probability (Fig. 5a, Supplementary Tables S5) and spread from China with low BF value and posterior probability. Combining these two points, it can be determined that PDCoV originated in Asia, more likely in Southeast Asia, as consistent with the results of Xiao’s phylogenetic analysis (Ye et al., 2020).
The phylogeographic inference indicated that PDCoV might have originated around June of 1989 (June 1982–January 1996, 95% highest posterior density).
3.3 The spread of PDCoV
The worldwide spatial dispersal networks of PDCoV were reconstructed. We selected the transmission routes with BF values exceeding 3 and posterior probability exceeding 0.5 to analyze (Ye et al., 2020). There were six discrete sampling locations and nine significant transmission routes (Fig. 6). SEA and JSK were the major output of PDCoV. SEA was linked with four locations, including Ayiti (BF = 68.10, migration rate = 0.924), JSK (BF = 9.48, migration rate = 0.938), the USA (BF = 6.97, migration rate = 0.936), and China (BF = 38425, migration rate = 1.345). JSK had connections to four locations, including Peru (BF = 54.13, migration rate = 0.913), USA (BF = 4.62, migration rate = 0.927), SEA (BF = 9.93, migration rate = 0.958) and Ayiti (BF = 70.21, migration rate = 0.933). In addition, there was a transmission routes from the USA to JSK (BF = 7.52, migration rate = 3.553) (Fig. 5b, Supplementary Tables S5). China and Southeast Asia are adjacent to each other, the distance is about 2000km and the communication is relatively close. Therefore, there was the highest BF value and high migration rate. Remarkably, a special case got our attention. We observed that a strong signal of viral dissemination from the USA to JSK, even though the two places are approximately 11200 km apart, suggesting that the PDCoV in the JSK may spread from the USA, as consistent with the results of MCC tree. In Ayiti, pigs were reintroduced from North American populations, mainly the USA and in small part, Canada (Alexander, 1992) and there was a certain link between Ayiti and JSK. Infections in Ayiti were likely to be associated with the importation of pigs from the USA.
3.4 Protein structure analysis
Two samples named LC216914 and LC216915 in GenBank were collected from pigs’ nasopharyngeal, suggesting PDCoV may be able to cause respiratory infections in pigs (Woo et al., 2017). A sample named MK248485 in GenBank was collected from chickens, suggesting PDCoV can infect chickens (Boley et al., 2020). Three samples named MW685622, MW685623, and MW685624 in GenBank were collected from children’s blood, suggesting PDCoV has the potential to infect humans (Lednicky et al., 2021). Comparing the six special sequences with all the sequences, it was found that there were similar changes in seven amino acid sites. The structure of the S protein from residues 52 to 1017 was shown in Fig. 7, because this region of beginning and end are hydrophobic and can adversely affect protein solubility (Lednicky et al., 2021).
Residue 38 was mutated from P to L, which may affect the secondary structure of the protein, because proline is a subamino acid, which cannot form intra-chain hydrogen bonds and is prone to β-turn angle formation. Residue 40 was mutated from R to S, which reduces the space resistance and enhances hydrophilicity. There was a N-glycosylation site between residue 41 and 44 so residue deletion at site 45 may affect glycosylation (Lednicky et al., 2021). The mutation between A and V at residue 137 and 551 may eliminates specific Van der Waals contact, potentially enhancing protein flexibility and dynamic movement of S1. Moreover, this change may represent a common mechanism that enhances dynamic movements, accelerating virus membrane fusion events and transmission (Lednicky et al., 2021; Thompson et al., 2021). Mutation at residue 670 altered the spatial site resistance. The phosphorylation of the protein is mainly carried out on tyrosine, serine, and threonine residues in the peptide chain. Residue 689 was mutated from S to A, with a phosphorylation site losing (Fig. 8 and Table. 1).