By Eric Swanson
World Economics has released a Data Quality Index (DQI), rating the quality of GDP estimates for 154 countries. The announcement, authored by Dariana Tani, recognizes how influential GDP data are and acknowledges some of the serious questions that have been raised about the validity of GDP and national accounts data by academic observers and official bodies. The DQI is presented as a “new way to judge which countries (sic) GDP you can trust.” Therefore, it is striking, and perhaps ironic, that the DQI depends heavily on GDP, both directly and through the proxy indicators used to construct it. The grouping and scaling of the underlying data also pose serious problems for the interpretation of the index.
The DQI is a composite indicator of five components. Two capture the practice of national accounts compilers: the version of the system of national accounts (SNA) in use and the national accounts’ base year. The other three are proxy indicators intended to measure factors that may affect the reliability of GDP estimates: the size of the informal or “shadow” economy, which measures how much of GDP may go unrecorded; Transparency International’s (TI) Corruption Perception Index (CPI), which is intended to gauge the possibility of improper government influence on GDP estimates; and GDP per capita (in PPP terms), which is proposed as a proxy for the resources available to the statistical system. Two additional components measuring the size of the government and financial sectors are under development for 2016.
The Hidden Weight of GDP
A reliable indicator of statistical quality would be very useful. GDP is a measure of economic capacity and, measured in per capita terms, an important determinant indicator of development progress and human welfare. So how does the DQI do? According to the DQI, the highest rated country is Switzerland, followed by the United States and Norway. Like the top three, the next 40 countries are all classified as high-income by the World Bank. Only seven high-income countries are ranked lower. The lowest ranking of them are Russia at 104 and Venezuela at 117. The highest ranking middle-income country is South Africa at 44, followed by Argentina at 45. Low-income countries all rank near the bottom. As so often happens in development, it appears that the best way to improve data quality is to increase GDP. This isn’t an accident. It stems from how the DQI is constructed.
In the published version of the index, each of the five components has equal weight, so GDP per capita determines at least 20 percent of the score. But in practice, GDP per capita has much greater weight. Both the estimate of the size of the informal economy and TI’s CPI are highly correlated with GDP per capita, as the correlation matrix below shows. The only two components with weak correlations with the size of GDP per capita are the national accounts base year and the SNA version, the only components that directly reflect the statistical methods used to estimate GDP.
orrelation coefficients |
Base Year |
SNA version |
Shadow Economy |
GDP per Capita |
Corruption |
Base Year | 1.00 | ||||
SNA version | 0.55 | 1.00 | |||
Shadow Economy | 0.26 | 0.39 | 1.00 | ||
GDP per Capita | 0.31 | 0.47 | 0.71 | 1.00 | |
Corruption | 0.35 | 0.52 | 0.66 | 0.80 | 1.00 |
Overall Score | 0.63 | 0.74 | 0.79 | 0.86 | 0.85 |
The fact that the shadow economy and corruption measures have such strong correlations with GDP reveals how much more GDP per capita size influences the DQI score than is first apparent. A linear regression of the DQI score on GDP per capita produces an R-squared of 75 percent with a coefficient of 0.548 on GDP per capita. This means that the other four components only contribute to 25 percent of the variation of the DQI.
Proxy Problems
Transparency International’s CPI has a well-deserved reputation for identifying misbehavior on the part of governments. But there is no way to determine whether government interference with the calculation of GDP is proportional to the perception of corruption recorded by the CPI. Interference in the calculation of GDP certainly happens. Tani cites examples of suspect inflation rates from Ukraine, Syria, Venezuela, and the well-known problems with Argentina’s consumer price index. But the CPI was not designed to identify bad statistics. A search on the TI website for references to statistics or GDP turned up no references to corrupt influences on GDP or statistics in general.
Measuring the size of the informal economy and accounting for it as part of GDP is a challenge for any statistical office. Failing to do so would degrade the quality of the resulting GDP. To measure the size of the shadow economy as a percentage of national income, the DQI uses estimates by Schneider and Williams (2003). While this is a plausible measure of the challenge of measuring the informal economy, we do not know how much of the shadow economy may already be captured in the published GDP estimates. In any case, Schneider and Williams’ estimates are highly correlated with GDP per capita, as the correlation matrix above shows. Thus, the amount of new information added through the proxy measure is significantly reduced.
For many countries, the lack of resources for statistical operations and staffing is likely the most important cause of poor data quality. But GDP per capita is a very general proxy for the domestic resources committed specifically to statistics. It also misses the aid that external donors provide to support improvements in statistics in developing countries. Including GDP per capita, already highly correlated with the other components, virtually assures that the richest countries will come out near the top of the index.
Cautions on Scaling and Grouping
In constructing the DQI, each of the five component indicators has been grouped into deciles and scaled from 0 to 100. The TI CPI scores are the sole exception, left ungrouped and originally published in a range of 0 to 100. Because the indicators are used as ordinal series, their magnitudes have no relevance. Grouping destroys some potentially relevant ordinal information. Nor is it clear why the values of the underlying series have been mapped into specific groupings. For example, countries using the 2008 version of the SNA are assigned a value of 100 while those from 1993 are scored 70; those from 1968 are scored 30; and those from 1953 are scored 0. Is the 40-point difference between 70 and 30 supposed to indicate some greater deficiency than the 30 point differences between the other groupings? Why not simply rank them 1 through 4?
Scaling is a related problem as the differences in mean values result in unintended weighting. A common procedure in constructing indexes from disparate components is to normalize the series to a common mean and standard deviation. The resulting composite indicator – the overall score – can then be rescaled to any range desired. Although the data constructing the DQI are all scaled from 0 to 100, their mean values are quite different. For example, the means of the SNA and Base Year indicators, are both 72.3, while that of GDP per capita is 37.1. The means of the informal economy and corruption indicators are 46.1 and 43.9, respectively. This unintended weighting means that, on average, the SNA and Base Year components have a slightly greater weight in the final index simply because their average values are greater. Of course, GDP per capita remains the dominant factor.
Expunging GDP and Normalizing Indicators
Taking into account these criticisms, let us consider a new index that removes the indirect influence of GDP per capita and normalizes all indicators before inclusion in the index. We will otherwise use the same data, including the arbitrary groupings. First, we normalize the existing series to zero mean and unit variance. Then we run regressions on GDP per capita for each of other four components. Taking the standardized residuals from these regressions removes the systematic component of GDP per capita. The residuals also have zero mean and unit variance. We use them to construct two indexes: one with and one without the normalized GDP per capita series, and then rescale both from 1 to 100. (The residual data series and new indices are available on request in Excel or CSV formats.)
The results show some surprising changes in rankings. Using all five components, the top three countries are Finland, Iceland, and Rwanda. India rises to the ninth position; the United States falls to fourteenth; and South Africa is right behind at fifteenth. Omitting GDP per capita entirely, Rwanda rises to number one followed by Lesotho and India and the US drops to fifty-third place. Some very rich countries – Saudi Arabia and Kuwait – fall to near the bottom.
These results should not be read as absolute measures of data quality. A fair interpretation could be that they reflect the performance of countries on the other four components, given their GDP per capita level. So poor countries that score relatively high on the TI CPI, have relatively small shadow economies, or have updated their base year and adopted a recent version of the SNA all do well. Assuming that these four indicators tell us something about the reliability of GDP and other national accounts aggregates (and certainly the base year and SNA version have relevance), then the revised measures might help us to identify success stories. But the large changes from the original DQI should also serve as a warning of how little we really know.
The Way Forward
To have evidence of the quality of data, we need more data about what actually goes on in the statistical office. We need to know whether it has employed a properly balanced sources and uses matrix, whether a business registry exists and when the last census of business was conducted, whether the models used to estimate agricultural output have been updated, and many more details. It might also be useful to know the level of training of the statistical office staff and their average salaries. And what are the institutional arrangements in place to protect the independence and integrity of the statistical office?
Some of this information is available through the Data Quality Assessment Frameworks (DQAFs) maintained by the International Monetary Fund (IMF). The IMF’s Article IV reports on the observance of standards and codes also provide field reports on the performance of statistical offices. PARIS21 tracks donor funding of statistical capacity programs. Self-assessments and peer reviews by statistical offices are also valuable sources. Compiling these and other sources of information could support a robust assessment of the quality of national accounts statistics, whether or not they were turned into a single index number.
None of this will be of much use unless it results in better practices on the part of statistical offices. Statistical offices should own the process but, especially in poor countries, they need support – from their governments, from their funders, and from the agencies that promulgate international standards. Feedback from data users and independent watchdog organizations can play a helpful role. In any case, much hard work remains to be done.