“Harmony” can mean a lot more than something that sounds nice. Wikipedia suggests the term derives from the Greek harmonía, meaning “joint, agreement, concord”, from the verb harmozo, “to fit together, to join”. That’s an excellent definition for the context of statistics, where statistical harmonization is about how to fit data together. Having just visited several European statistical offices and similar organizations, it’s clear to me that it’s a challenge to harmonize anything but the most trivial statistics.
The quality of data can influence the effectiveness of government and the quality of our lives. What schools have classrooms that are substandard? What areas have the highest vaccination rates? What hospitals have the longest waiting lists? The answers can be fairly straight forward, and accurate responses can greatly improve government services. On the other hand, unfortunately for those responsible for providing the answers, the statistics can be difficult to put together, and it is only after harmonization that many statistics can become a useful foundation for good decision making.
Data harmony is what is needed in order to develop, use, and share interesting and meaningful statistics. Before organizations reach this kind of data sharing nirvana, a few incremental improvements, such as adopting rich open data standards, can improve the quality and reliability of shared data.
Statistics are derived by collating data from multiple sources that first need to be made comparable. For example, in Germany the first population census in over 20 years will be based on administrative registers owned by regional and municipal statistical offices throughout the country. The registers are maintained independently and don’t all necessarily define citizens in the same way. The data needs to be harmonized into a coherent whole. That means that the census project needs to put together datasets from hundreds of disparate sources. Those familiar with fitting just two or three datasets together will have a feel for how painful this can be.
If integrators and providers agree on data standards up front, it avoids the huge challenge that the German census team will have to confront. Part of the SDMX (an open data standard) model are specifically dedicated to rigorously defining such agreements. Success stories such as the Joint External Debt Hub, and the pilot for the European census hub, back up the fact that it can work. These projects are built on top of agreements to adhere to certain data and metadata structure definitions (SDs), where each provider is responsible for harmonizing the data they contribute. The advantage of this scenario is that the resulting data has broader applicability, and multiple different consumers of the data are not required to perform the same harmonization work (often with varying degrees of quality). Unfortunately, getting agreement on SDs can take a very long time. It might be as little as a few days for a small, bilateral exchange, but it could also be years for a complex agreement between major organizations.
The diagram below is an attempt to sketch out a path toward more mature practices for data exchange and show some of the benefits gained along the way. Does it make sense to you? And where does your organization fit: are your data exchange practices in harmony, or more of a clanging dissonance? Either way, it would be great to hear about experiences in the world of data exchange from different perspectives, and if/how you think SDMX or other open data standards might help.
While comprehensive multilateral exchange agreements facilitated by SDMX may represent the ultimate in efficiency and data quality standards, a relatively simple data sharing exchange can still deliver benefits without requiring agreement between all parties. All that is required is for participants to adhere to the SDMX technical standard - something that is relatively easy to do with the right tools. Such simple beginnings can also help organizations prepare for exchange agreements to be worked out over time. Small steps or large, it is a journey worth taking.


Do you have any comments about how SDMX could be applicable for the International Aid Transparency Initiative? http://www.aidtransparency.net/
Doug, I had a quick browse of the “Consultation paper for data definitions and format”. In summary, I think it’s well worth looking at. You might also like to look at SDMX-HD (http://www.sdmx-hd.org/), which is an example of SDMX being deployed for the purpose of monitoring and evaluation in a donor community (the health domain). My understanding is that the difference between SDMX-HD and SDMX is just the additional content oriented guidelines that are being developed specifically for the health domain. Here’s a useful article on it: http://www.npoki.org/tag/sdmx-hd/.
Some more specific responses… If I understand it correctly, there are three key areas (from bottom of p. 2):
“Donors will publicly disclose regular, detailed and timely information on volume, allocation and, when available, results of development expenditure to enable more accurate budget, accounting and audit by developing countries.” - I’d say this is a very strong area for SDMX. One of the main user groups and indeed, five out of seven of the standard’s sponsors are financial institutes, so it’s aptly suited for this kind of work.
“Beginning now, donors and developing countries will regularly make public all conditions linked to disbursement” - there is nothing specific built into the SDMX model to help with this, but you could use annotations to document the conditions. You can do this as part of an SDMX metadata reporting structure, which allows the content and timing of future data releases to be released. You may want more than this, but it depends how “actionable” you need this information to be. If it is just to inform people, then it’s probably enough.
“Beginning now, donors will provide developing countries with regular and timely information on their rolling three- to five-year forward expenditure….” - I don’t know how much SDMX is used to exchange predicted results, as opposed to numbers that already exist. However, given the model has very clear mechanisms for handling versions of data and time series, it should work OK.
Hi Doug.
It looks like IATI are very closely linked with DFID in the UK and also with UNDP. Given that context, if you were interested in assistance with reviewing
(a) technical aspects of harnessing SDMX-ML to represent data and metadata structures which meet your needs (as opposed to developing an approach which is entirely IATA specific)
- SDMX, in a standard machine actionable manner, handles needs to cater for multilingual content etc
(b) applying appropriate SDMX content standards (eg countries, currencies etc) within data and metadata structures that meet your needs
then World Bank, as an SDMX sponsor, would probably be closest in terms of content/application.
IATA’s use case might (or might not) also be relevant to IMF and other sponsors reviewing possible requirements and approaches to offer an “SDMX Lite” to address concerns that SDMX in its current form may be “over-engineered” to easily support very simple use cases (although that engineering is instrumental to it meeting other use cases).
UNICEF’s DevInfo team have done a lot on delivering capabilities underpinned by SDMX to developing countries, and have worked with DFID, but my understanding is that they mainly focus on statistical data rather than financial “program management” data.
If you’d be interested in following up some of these aspects but you’re not sure of contacts in regard to SDMX then let me know.
Coming back to the theme of Don’s original post, one advantage of such an approach is that IATI should then be able to more readily draw on, and integrate with, relevant data from other sources.