3.1. Data Collection
Data associated with the variables were collected from different
official sources for a total of 42 top counties that accounts for
448,989 COVID-19 cases as on 26th December accounting
for 84.78% of the total. Date wise infections, recovery, and deaths
were collected from the website of the World Health Organization (WHO).
The data for infrastructure centred variables like the number of
hospitals and the number of doctors was taken form [51]. The
environment-based variables like average temperature and humidity since
the onset of COVID-19 was taken from[52]. Day wise COVID-19 cases
distribution extracted from WHO was used to identify countries that
shows sign of containment of the virus based on a novel exponential
growth modelling approach. Raw data from the sources was also
consolidated and the variables physicians per thousand individuals,
hospitals per thousand individuals, percentage of lockdown days since
the first contact, cases per million population, deaths per million
population, days since the first case, serious cases per thousand
infections, average temperature since the first infection, and average
humidity since the first infection were calculated so that they are
ready to train machine learning models.
Table 1: Country Wise Information on Infrastructure, Weather, Policy,
and Infection as on 26th Match 2020