Last Update: 29 April 2020 11:00 BST
Find my Medium writeup here: https://towardsdatascience.com/4-ways-to-analyse-pandemic-data-covid-19-ccf4bce32a33
The primary focuses of this study are (i) to enable statistical comparisons across affected geographies from different perspectives, in order to highlight benchmark practices in combating Covid-19; and (ii) to connect datasets and sources from various fields (epidemiology, public health and social behaviour), repurposed to draw correlations, co-occurences and more advanced modelling.
The data sources, analysis and experimental outputs will be updated and added periodically. Visualisation libraries include matplotlib, seaborn, plotly express and folium.
A prolonged growth factor of less than 1.00 signifies the inflection point for an exponential curve into a logistic curve. In other words, we're flattening the exponent!
Population statistics are a key factor in epidemiology, as population size and density affects the rate of which infectious diseases can spread in a given region. Therefore to accurate assess the effectiveness of regions in containing the virus, it is important to consider population as a normalizing factor.
The above interactive chart takes the Top 30 countries by confirmed cases, and ranks them by their confirmed cases normalised by the total population per million.
Reliability in number of reported confirmed cases is dependent on rigorous and widespread testing. The rate at which an infectious spread can be detected and measured is vital for coordinating efforts to contain it.
In this section, we explore which countries has taken an early initiative to testing, as well as the reliability of reported figures by comparing confirmed cases to number of tests performed.
As expected, a quick correlation analysis across 57 counties (latest source) between Total Tests and Total Confirmed cases found a strong correlation size of 0.88. The more tests are done, the more cases are detected. Countries that has increased their testing efforts are less likely to have their case figures under-reported.
As with any disease (and in this case a pandemic), knowing early on how severe the infection is, will limit the spread and mitigate further damage.
The Test:Confirmed Ratio measures the ratio of Tests conducted in comparison with Confirmed cases. Countries ranked high towards the right of the graph below, has conducted a significant volume of test despite having a relatively lower number of Confirmed cases.
The interactive chart below (logarithmic scale) illustrates where countries are positioned by volume of confirmed cases against volume of tests performed.
Notice the trend line of 'lightly coloured' countries ..could this be the "Test:Confirmed sweet spot" that countries should maintain in order to control the viral spread? The average Test:Confirmed ratio of the 7 countries (S.Korea, Australia, UAE, S. Africa, New Zealand, Bahrain, Lithuania) on this line is 43.91. That is for every person infected, 44 persons should be tested as best practice in order to track and control the spread.
Data updated as at 07 April 2020. Ourworldindata.org manually reviews data across national reports, including the most recent estimates.
Testing definition: The most common tests for COVID-19 involve taking a swab from a patient’s nose and throat and checking them for the genetic footprint of the virus. They are called “PCR tests”. The first PCR tests for COVID-19 were developed very rapidly – within two weeks of the disease being identified. They are now part of the World Health Organisation (WHO)’s recommended protocol for dealing with the disease.
Caveats on Testing data:
Novel ways of tracking human behaviour has arisen with the increased usage of search engines and social media.
Here i analyse Search Topics and Concepts related to Covid-19 to assess public awareness and action in response to the viral spread.
The simple plot below charts the timeline of increase in confirmed infectious cases against Google search popularity related to Covid/coronavirus.
It is SHOCKING to observe that there is a full 1 month timelag in the public's search interest, after the number of confirmed cases spiked in mid February. Searches only peaked aroun 12 March when WHO announced a Pandemic-level threat.
The next interactive visualisation below is not as important as the one above, but it does seem more fun.
Here you can hit the 'Play' button and observe how Covid-related google searches changed over time relative to the increase in number of confirmed cases.
Google API last called at 1 April 2020.
Google Trends analyzes the popularity of top search queries in Google Search across various regions and languages. The website uses graphs to compare the search volume of different queries over time.
The scores awarded by Google Trends on the "interest over time" line graph express the popularity of that term over a specified time range. Google Trends scores are based on the absolute search volume for a term, relative to the number of searches received by Google. The scores have no direct quantitative meaning.
Another group of key factors to consider is the responsiveness of governments towards the international threat of Covid-19. The data used in this section is from the Oxford COVID-19 Government Response Tracker (see more below).
This graph illustrates the normalised aggregated regulatory actions put into place by governments worldwide over the past few months. Observe which regulatory actions were put into place first and which followed, and how these changed over time.
The box in the middle describes what i refer to as the 'key action period', with the left edge being WHO's announcement of a Global Emergency on 30 January 2020, and the right edge being WHO's announcement of a Pandemic-level threat.
In addition to the Oxford team's Stringency Index, i am proposing a separate measure of lead time in government response to Covid-19: the Government Responsiveness Metric. Purpose of the metric is to test which form of government response implemented at which timing (and the delay in association with specific key events) is most significant in measuring Covid infection rates.(Experiments ongoing, methodology will be shared once results are consolidated)
The chart below illustrates the Responsiveness Metric (measured in number of lead days leading up to a sequence of key milestone events) aggregated at the continent level, across 9 types of government actions.
Example: Australia Region had implemented S7 international travel controls and S5 public info campaigns up to an average of 20 days (minus 20) before certain infecton milestones, but took longer to S2 close workplaces.
Latest data as at 7 April 2020.
The Oxford COVID-19 Government Response Tracker (OxCGRT) aims to track and compare government responses to the coronavirus outbreak worldwide rigorously and consistently. The OxCGRT systematically collects information on several different common policy responses governments have taken, scores the stringency of such measures. Data is collected from public sources by a team of over one hundred Oxford University students and staff from every part of the world.