Learning from coronavirus data use and demand
by Caleb Rudow
Open Data Watch
27 June 2020
The coronavirus pandemic has brought to the world’s attention the need for accurate and timely data to identify vulnerable populations and guide decisions on limiting transmission and allocating resources. Policymakers and citizens alike are debating the drawbacks of different datasets, the merits of epidemiological models, the accessibility of data and research, and the biases in coronavirus testing results. But what do we know about the uptake of coronavirus data? And what can we tell about the demand and interest in these datasets from publicly available metrics? Open Data Watch (ODW) took a deep dive into the public and private data on coronavirus to answer these questions.
We monitored the online mentions of coronavirus datasets, website traffic for coronavirus data, and traffic to ODW’s coronavirus data products to understand the demand for coronavirus data and how the public conversation is being driven by data. The results show that although there was a large spike in interest in the early days of the pandemic, this interest has since ebbed.
Google search traffic for coronavirus data is dropping off
Search engines are one of the biggest sources of traffic for websites (and websites that hold coronavirus data) and generally reflect the public’s interest in a topic. According to Google Trends — a tool that analyzes the popularity of search queries in Google over time — the volume of searches related to coronavirus data are dropping from their peak in late March and early April. The graph below highlights this trend and shows how the search term “coronavirus data” peaked but has declined over time, as have other coronavirus-related search terms, like “COVID-19 data.” These search patterns, taken together, signal a possible decline in interest in coronavirus data. The search term “coronavirus data” is by far the most popular coronavirus data-related search term of ten terms we evaluated in Google Trends, so this term is used throughout this research to track how the public is using and talking about data on the coronavirus. You can also play with the data and add new search terms in Google Trends through this link. Try “toilet paper” as a search term if you really want to see what exponential growth looks like. While it could be the case that people searched for and have now found the coronavirus data that they need, other available metrics on mentions of “coronavirus data” online and website traffic paint a similar picture to the Google Trends results.
Mentions of coronavirus data on the internet are dropping off
As part of ODW’s work as a data watch organization, we track mentions of coronavirus-related keywords online to understand how the conversation about data-driven decision making is shifting over time. Mentions of “coronavirus data” (the most searched for coronavirus data-related keyword) on Twitter, blogs, news articles, and around the web are down from a peak in late March of between 400-500 mentions a day down to an average of about 100 a day. Mentions of “COVID-19 models” shows a similar downward trend in public discussions. The tool we use – Brand24 – tracks only exact mentions of a phrase so mentions of “data on the coronavirus” or any other variations wouldn’t be tracked here, hence the possibly lower than expected overall number for mentions in a day. If we combined all of the possible variations of coronavirus-related data keywords, we would have a higher total number of mentions per day, but as we believe these phrases are proxies for the larger conversation, we would see a similar downward trend. There have been, however, a few spikes of traffic resulting from the accusations of manipulated coronavirus data in Florida and Georgia (mid-May). We have also seen spikes in traffic as cases have gone up in the southern region of the United States. But, generally, there is a clear downward trend in mentions of coronavirus data-related keywords online and less public conversation on coronavirus data.
Monitoring website traffic for coronavirus resources provides insights on use
On a more granular scale, we can track how people are accessing ODW’s coronavirus-related materials that are focused on open data and data-driven decision making. This moves beyond just interest and public discussions on data, to numbers of visitors and traffic to coronavirus data resources, a more direct measure of data use. You can see the same trend below: a huge spike in interest and then the traffic begins to tail off. We have seen similar trends in partner organizations that have shared their website traffic numbers with us.
Estimates from NSOs on website traffic show a range of traffic patterns
We have also done research and talked with our partners in national statistical offices (NSOs) about their traffic and found more mixed results. ODW scanned NSO websites around the world for coronavirus-related keywords (COVID-19 data, coronavirus statistics, and five other similar terms) and found that only 14 percent of NSO sites mentioned these keywords. Although this isn’t a metric of use, if someone was looking for coronavirus data on a website and couldn’t find any occurrence of these terms, then the odds that they would use that site to find coronavirus data are low. Further, estimates from NSO partners have shown that while some countries have seen an increase in traffic, others have not. The United Kingdom Office for National Statistics, for example, has noted a huge increase in traffic, up from 200-500 percent since last year, but other NSOs have seen no significant increase in traffic. The NSO traffic trends probably depend on many factors: whether or not the NSO website has coronavirus data on it (some countries have hosted data on the ministry of health website); the quality and timeliness of their data; how well the NSO is marketing and disseminating this content; and how many cases and deaths of coronavirus there are in a country.
Where do we go from here and how do we increase interest in and use of coronavirus data?
Although there was tremendous initial interest in finding coronavirus data, this interest has not been maintained. And while there are some amazing success stories of increases in traffic, like an estimate from Our World in Data of a 700 percent increase in traffic to their site, NSOs or government sources may not be experiencing similar increases. It is telling, for example, that in the United States the most popular coronavirus data sites are not from the Centers for Disease Control or other government entities, but from academic institutions, like Johns Hopkins University, and citizen-created dashboards, like the one created by 17-year-old Avi Shiffman. Data users may also have become discouraged because they have not found the data they are looking for. Global Health 50/50 regularly scans the Web for sex-disaggregated data on coronavirus cases and deaths, but has so far found official data from only 102 countries, of which 53 provide complete data. This is a sign that NSOs and government agencies are not providing data in the form or with the frequency that people need. If NSOs don’t take this opportunity seriously then global data integrators and modelers must continue fill the vacuum.
We should not expect the same levels of interest in coronavirus data that we saw at the start of the pandemic, but we may be able to move the needle to increase data interest and use. Using best practices for dissemination, including delivering data along with visualizations; paying attention to timeliness and quality; moving beyond the “if you open it, they will come” mindset towards a more active dissemination strategy; and building partnerships with journalists and influencers to help get key findings and datasets out to the public are all a good start.
Gartner’s hype cycle may also be a helpful way to think about this process and coronavirus data use. The hype cycle in some ways mimics what we’ve seen in our research on coronavirus data. According to the hype cycle, we are now in the trough of disillusionment but with effort and leveraging best practices on data dissemination, we may be able to move the needle towards a ‘plateau of productivity.’ The worst case scenario, however, is that we are not seeing the hype cycle and traffic doesn’t edge upward, in which case, we will have just seen a spike in interest in data and will not see a recovery in traffic and a movement towards the ‘plateau of productivity’ and back to increased use. Where things move from here will, in large part, depend on how the data community responds with more and better dissemination efforts to get the right datasets, in the right formats, to the right people to make better decisions to fight the virus.
Figure 4: Gartner Hype cycle
There is a lot to learn from the patterns of use and demand for coronavirus data. As part of ODW’s work program, we are planning to conduct more research regarding data use and its impact on policy. Stay tuned for more information and connect with us if you have any ideas or suggestions.