A Year of Data Insights
in the Time of COVID-19
by Caleb Rudow, Elettra Baldi,
and the Open Data Watch Team
11 March 2021
The COVID-19 pandemic has brought to the world’s attention, like never before, the need for accurate and timely data to guide decisions and solve problems. It has also been a stress test, like no other, for statistical systems around the world. Policymakers and citizens are talking about the drawbacks of different datasets, the merits of epidemiological models, accessibility to data and research, the problems of biases in COVID-19 testing results, and the best ways to manage and learn from data in a world inundated with it.
At Open Data Watch, we have been monitoring the conversation about COVID-19 data this past year through our Data in the Time of COVID-19 resource page. From our research — finding relevant COVID-19 data articles, performing analyses on COVID-19 data, and being a part of the global conversation on the use of data to fight the pandemic — we have learned a lot. This year has shown both the power of data and the barriers to data use.
The one-year mark of the World Health Organization’s declaration of COVID-19 as a pandemic is a somber time for reflection, but also an opportunity to take stock of what has been learned so that we can use data to bring about the end of the pandemic in a way that is quick and equitable. Here we summarize some of the lessons we have learned.
! Focus on providing information on gender and other disaggregations to understand the impact of COVID-19 across society.
The pandemic has not affected everyone equally. Advocates call for more disaggregated data to highlight these gender, racial, ethnic, and economic disparities.
Gender data is a critical tool to fully understand COVID-19 transmission and its impacts on women and girls. The World Health Organization found that sex-disaggregated data are often inconsistent, incomplete, or hard to access. Open Data Watch and Global Health 50/50 found that only 41 countries reported sex-disaggregated COVID-19 cases and deaths as of June 2020, a figure that improved to 69 countries by late July 2020 and reached 97 according to estimates in late February 2021. The analysis found that high-income countries report most of their COVID-19 cases and deaths by sex and that this reporting by sex declines as you go down the income ladder. The lack of data is particularly worrisome in light of increasing violence against women and girls and because of gender disparities in areas such as the economy, and education, education and employment and unpaid care, and increasing violence against women and girls. Without these data it is difficult to observe and address these disparities.
Not only are sex-disaggregated data missing, but critical data on low-income communities, incarcerated populations, people of color, Indigenous communities, and disabled people are also scarce. This is problematic because we know from what little data we do have that many of these groups have been hit particularly hard by the pandemic. Data show that Latinx and African-American populations in the U.S are three times more likely to be infected with COVID-19 than White populations. In addition, early data on vaccine distribution shows that Black and Hispanic people make up a disproportionately small number of people among those who have received the first dose of the COVID-19 vaccine.
Leaving no one behind first requires knowing who is being left behind, something that is not possible without data. What we have learned in the pandemic should both scare us and inspire us to action to fill in data disaggregation gaps.
! Increase funding for COVID-19 data and improve long-term funding for national statistical systems.
The high demand for data on the primary and secondary effects of the pandemic makes a strong case for more and better financing for national statistical systems (NSS) to supply these data, with national statistical offices (NSOs) playing a crucial role in coordinating efforts and supplying data. Studies prior to the pandemic showed that NSOs did not have the funding to fulfill the data demands for the SDGs, a condition only exacerbated by the financial strain that the pandemic has placed on their budget and those of their development partners. The most recent PARIS21 Partner Report on Support to Statistics (PRESS) shows that funding for data and statistics on COVID-19 has not increased to meet the new data demands. Countries across the world need to make significant funding efforts together with partners to provide the financial support NSOs need to help them provide the data needed to create a better response to the pandemic and its after-effects. The Bern Network, a global alliance for more and better financing for development data, is working through this crisis to help bring countries and donors together to help build stronger statistical capacity during the COVID-19 pandemic and beyond. The pandemic presents an opportunity to strengthen NSOs so that they have the capacity to respond better and more quickly to crises in the future, but only if the global community comes together and rises to the challenge to provide the funding that NSOs desperately need.
! Support well-functioning foundational and administrative data systems to ensure the availability of timely data on the pandemic.
The pandemic highlighted gaps in administrative data sources, such as civil registration and vital statistics (CRVS) and health information systems that are key data sources for providing timely data on cases and deaths and are critical for public health and planning policy interventions. Without these data, researchers are worried that a “silent epidemic” is occurring. For example, all 54 countries in Africa together have recorded fewer deaths than France. Without well-functioning CRVS systems, we simply don’t know how many deaths are occurring in Africa, and this deprives public health officials and others of critical information needed to respond to the pandemic. In the absence of administrative data, researchers have worked to find other methods for estimating mortality rates, such as counting graves via satellite or even calculating the number of coffins sold. Unfortunately, these methods are not accurate compared to a well-functioning CRVS system. This is not only a problem in Africa; an analysis of mortality statistics during the pandemic in 14 countries in Europe and the United Kingdom finds a high number of excess deaths, or deaths above normal for a given time period, in 13 of the 14 countries—a signal of under reporting of COVID-19 mortality. A strong focus should be placed on creating better CRVS systems to track the pandemic and other public health crises.
! Improve coordination between data providers, implement standards for interoperability, leverage new data sources, with NSOs as leaders.
The COVID-19 pandemic has led to an unprecedented demand for data. Official COVID-19 datasets on cases and deaths are at the center of this, but the data landscape has been flooded with many other data sources to understand the effects of the pandemic, such as mobile phone data, sewage data, google search data, and Facebook data. In many ways this is indicative of larger trends in international statistics: non-official statistical sources are growing in prominence and the data landscape is becoming more complicated. While new non-official sources can help fill in critical data gaps, they can also complicate the data for decision makers due to different quality standards and other governance issues. There is a need for leadership and coordination to leverage all available data assets for decision making. As the leader in national statistical systems, the NSO is well placed to help this process and take on the role of data steward.
Much like the need for coordination of data, there is also particular need for coordination of standards of interoperability. The United States has been a key example of this as the lack of a national strategy for COVID-19 reporting led to many instances of duplications of efforts and an inability to compare data across systems. At the international level, the lack of guidelines and standards for new variants has made it difficult for the world to track their spread. Building interoperable data systems and coordination methods for setting standards before the next crisis is critical to mounting a quick data-driven response.
! Balance privacy and access through smart, flexible data contracts.
When the COVID-19 pandemic started, NSOs, governments, academics, and relevant stakeholders rushed to collect timely data to track the pandemic’s impact to design effective response policies. Less thought was given to how to protect individuals’ privacy and prevent data ‘mission drift’ that shifts towards using the data for other purposes. For example, Singapore declared that data collected during COVID-19 contact tracing can be used for criminal investigations, raising doubts about whether personal contact tracing data would be used for other purposes. The Mexico City government implemented a COVID-19 tracing app that required users to scan a QR code with their mobile device upon entering a closed space, raising privacy concerns among citizens and the international community. While many of these measures are necessary for health surveillance, they can have a long-lasting impact and set some norms about the pursuit of data at all costs. Data access and data privacy are always a balancing act, and we need smart data agreements and licenses to reflect this.
! Focus on facilitating the access and use of COVID-19 data.
COVID-19 related data must be made available in easy-to-use formats that are machine-readable and non-proprietary so that data can reach the public, journalists, policymakers, and other key stakeholders. Journalists are often the first to communicate COVID-19 data to their citizens. Major outlets such as the New York Times and BBC created their own COVID-19 trackers to share information as efficiently and accurately as possible. This highlighted the need for journalists to have the skills to analyze and integrate data into their stories and to collaborate closely with statisticians.
On the other hand, because data are so powerful, we have also seen people hide data, manipulate data, and delete data to fit their narrative. Even in the best cases, we have seen cases of people ignoring COVID-19 data because of psychological biases, misinformation, or political inconvenience. To deliver on the promises of the data revolution and achieve open data for decision making, these factors need to be better investigated and understood so that better dissemination methods, political incentive structures, and other strategies can be developed so that data get used.
Conclusion
The past year of the pandemic has changed the world in previously unimaginable ways. As the world mourns those it has lost, struggles to contain and guard against future outbreaks, and coordinates vaccine rollout, data will remain central to the conversation. COVID-19-related data issues touch all points along the data value chain, from collection to impact. And these issues can motivate policymakers, civil society organizations, and the public at large to focus discussions on missing data, disaggregated data, data systems, privacy, interoperability, data use, and the role of NSOs. Open Data Watch will continue to monitor these conversations, and through our work, provide insights and expertise to build back better from this pandemic and achieve the SDGs in the Decade of Action.