AI and Open Data Reloaded:
Our Wishes for 2024
by
François Fonteneau, PARIS21 and Shaida Badiee, ODW
22 December 2023
It seems AI is making “open data” fashionable again. Just like a familiar old tune that sneaks back up to top the charts after being forgotten for years.
About 20 years ago, the two of us—Shaida Badiee and François Fonteneau—invested our time and energy into spearheading open data in official statistics. From opening the flagship World Bank World Data Indicators to supporting 60 countries of the Global South to open well-curated anonymized survey microdata files in metadata-rich web-based catalogues — accessible data was our mutual goal. It brought joy to expert data users and relevance to expert data producers. Research thrived and citations of datasets multiplied. This has continued and grown since that time.
Decades later, we are back together as open data advocates.
So, what is new with AI? And why does it need open data? After all, AI has been around even longer than open data—and longer than the two of us.
Earlier this month, we had the privilege of participating in exceptional workshops organized by our Google colleagues.
Google Data Commons is revolutionizing the way everyday individuals share their data, enabling them to do so in a user-friendly manner. It empowers regular people to discover information using everyday language while providing AI-enabled machines with the tools to learn from, weave stories out of, and make this wealth of knowledge accessible to billions. In societies marked by polarization, where acquiring guns and drugs online is often easier than accessing reliable, non-partisan data on crucial issues like climate change or gender equality, it becomes imperative for experts to go the extra mile to ensure their relevance and visibility.
Few have a level of expertise and perseverance to navigate the intricate pathways needed to find the amazing data which has been curated by world class experts from Statistical Offices and International Organizations who both know the subject and the computation. The challenge lies in the fact that the tools and systems developed by such experts may be less intuitive for the public.
As we sat through the presentations on AI, a handful of questions came to our minds:
- What more can the official statisticians do to make their data visible, and help AI reduce its hallucinations and stop amplifying mis-information? And how can AI-enabled tools like the Data Commons of Google motivate and support them?
- Why is good data so buried today, so invisible to optimized search engines, despite all the efforts?
- How can technological giants accelerate the way these search engines reference and value quality, ex-ante harmonized, well-governed data? How do we maintain quality assurance when publishing mash ups of data from multiple sources?
- And lastly, and most importantly perhaps, how can we make sure this is done with the people and for the people?
One of the presenters at Google called the development data “messy”.
We don’t disagree. But it is our hope that with AI we can look for ways to improve the mess and not just learn to live with it. There comes a point when you can only sweep so much under the rug, ’til you feel lumps under your feet and your path seems to become less comfortable.
So, as 2024 begins, let us hope for a world where technology giants and statistical experts work together to surface quality data and bring us all one step further.