Official statistics provide an indispensable element in the information system of a democratic society, serving the Government, the economy and the public with data about the economic, demographic, social and environmental situation. To this end, official statistics that meet the test of practical utility are to be compiled and made available on an impartial basis by official statistical agencies to honor citizens’ entitlement to public information.
Principle 1, United Nations Fundamental Principles of Official Statistics
Criteria for Open Official Statistics
Official statistics provide an indispensable element in the information system of a democratic society, serving the Government, the economy and the public with data about the economic, demographic, social and environmental situation. To this end, official statistics that meet the test of practical utility are to be compiled and made available on an impartial basis by official statistical agencies to honor citizens’ entitlement to public information.
Principle 1, United Nations Fundamental Principles of Official Statistics
Official statistics are indeed “an indispensable element,” and often the largest and most valuable component of “the information system of a democratic society.” They include the censuses of population and housing, business, and agriculture; national accounts; labor statistics; money and banking and international trade statistics; education and health statistics; and much more. Most are produced by the national statistical office and allied agencies with legal authority to collect, compile, and disseminate data. Generically we refer to them as “official statistics” and to the responsible agencies as the “national statistical system.” As quintessential public goods, they are unlikely to be produced by the private sector, and they are used efficiently only when they are made freely available. Fundamental Principle 1 says that they should “made available on an impartial basis by official statistical agencies….” Recognizing the “citizen’s entitlement to public information,” Principle 1 is a strong statement for open access to all information produced by the national statistical system.
Most statistical systems have moved beyond paper publications to electronic dissemination. Two decades ago publication of data sets or reproduction of paper publications on CD-ROMs was considered state-of-the-art. Today all governments and most statistical agencies maintain some form of web presence. As the internet reaches more people and the power of computers and mobile devices increases, there are more opportunities and more formats for publication of official statistics. But not all published data are readily accessible and usable. They may be published in formats that are difficult to access; they may lack clear definitions and other explanatory metadata; they may be incomplete or out of date or just poorly organized. What are the criteria for assessing the openness of official statistics?
Because most statistical agencies have opened some portion of their databases to the web, we are interested in establishing a set of generally applicable and reproducible criteria for assessing the openness of websites managed by national statistical agencies. By “generally applicable” we mean that they can be applied to any website that is used to disseminate official statistics. By “reproducible” we mean impartial observers using these criteria should produce similar assessments.
Defining Open Data
Many definitions of “open data” have been proposed. Some are long and prescriptive, others are short statements of principles, and still others outline action plans that implicitly define open data. Here we are concerned exclusively with open public data or, more to the point, official statistics published on government websites. Publications on paper, CD-ROMS, or other media may be viewed as complementary but incidental to the web publication.
Taking the common elements of various open data definitions and considering their application to official statistics published on the Internet or World Wide Web, a set of criteria for assessing the openness of websites maintained by official statistical agencies emerge. These criteria do not attempt to cover all the attributes of a “good” website. They do not treat the aesthetics of the website or value-added products (such as infographics, dashboards, or e-publications) that are often part of a comprehensive data dissemination program. Our purpose here is to identify the elements that define open access to official statistics.
1. Legal authority and independence
What is the legal authority of the agency to collect and publish data? Is there a statistical law that defines that authority and does the law guarantee the independence and impartiality of the statistical agency? Is the statistical law published and accessible through the website? Is the agency responsible for production and dissemination of the data clearly identified?
2. Complete
Do the data available on the website represent the full range of the agency’s statistical holdings or only a subset? Are primary data available at the finest granularity consistent with protecting privacy? If the national statistical system is a federated system composed of multiple statistical offices, is there a convenient way to locate all of the important subsets of data?
3. Timeliness and time span
Are data regularly updated? Is the updating schedule published? Are consistent historical series maintained and available?
4. Adherence to standards
Does the agency employ recognized international standards and definitions for the compilation and documentation of statistics? If exceptions are made to international standards, are they fully documented?
5. Availability of metadata
Are metadata describing the relevant characteristics (including standards and definitions) of the data readily available with the data? Is other objective, interpretive commentary provided, as recommended by the Principals of Official Statistics, available? Is the commentary free of political or partisan considerations?
6. Selectability
Can the user specify a unique selection of data from a larger data set? Is it possible to download a complete data set?
7. Technical accessibility
Are the data published in commonly used formats that facilitate machine processing? Is there at least one non-proprietary option for retrieving data? Is there a published API?
8. Licensing
Are the terms under which the data may be used or reused published? Are they non-discriminatory? Does the license permit the use and reuse of data without restriction (except, possibly, requiring attribution)? Are data in standard formats provided free of charge? Do users have to register to access or download data
Next steps
Two recently published indexes of data openness – the Open Data Index and the Open Data Barometer have applied versions of these criteria to selected national and city-level datasets, but neither attempts to assess the full range of data produced by a national statistical office. (For further discussion of the ODI and ODB, see the accompanying article “Indexes of Data Quality and Openness.”)
The openness of data available on a statistical website is one dimension of an overall assessment of a statistical system. Coverage (time span and topics available), data quality, and the functionality of the website together with openness jointly determine the “practical utility” of published statistics. All four dimensions should be considered when assessing the performance of a statistical system. Such an assessment could be conducted as an expert evaluation (employing a few experts with specialized, local knowledge), but it would be more in keeping with the spirit of openness to utilize crowd sourcing. In practice some combination of expert evaluation, crowd sourcing, and self-reporting may be necessary.