“Harmony” can mean a lot more than something that sounds nice. Wikipedia suggests the term derives from the Greek harmonía, meaning “joint, agreement, concord”, from the verb harmozo, “to fit together, to join”. That’s an excellent definition for the context of statistics, where statistical harmonization is about how to fit data together. Having just visited several European statistical offices and similar organizations, it’s clear to me that it’s a challenge to harmonize anything but the most trivial statistics.
The quality of data can influence the effectiveness of government and the quality of our lives. What schools have classrooms that are substandard? What areas have the highest vaccination rates? What hospitals have the longest waiting lists? The answers can be fairly straight forward, and accurate responses can greatly improve government services. On the other hand, unfortunately for those responsible for providing the answers, the statistics can be difficult to put together, and it is only after harmonization that many statistics can become a useful foundation for good decision making.
Data harmony is what is needed in order to develop, use, and share interesting and meaningful statistics. Before organizations reach this kind of data sharing nirvana, a few incremental improvements, such as adopting rich open data standards, can improve the quality and reliability of shared data.
Statistics are derived by collating data from multiple sources that first need to be made comparable. For example, in Germany the first population census in over 20 years will be based on administrative registers owned by regional and municipal statistical offices throughout the country. The registers are maintained independently and don’t all necessarily define citizens in the same way. The data needs to be harmonized into a coherent whole. That means that the census project needs to put together datasets from hundreds of disparate sources. Those familiar with fitting just two or three datasets together will have a feel for how painful this can be.
If integrators and providers agree on data standards up front, it avoids the huge challenge that the German census team will have to confront. Part of the SDMX (an open data standard) model are specifically dedicated to rigorously defining such agreements. Success stories such as the Joint External Debt Hub, and the pilot for the European census hub, back up the fact that it can work. These projects are built on top of agreements to adhere to certain data and metadata structure definitions (SDs), where each provider is responsible for harmonizing the data they contribute. The advantage of this scenario is that the resulting data has broader applicability, and multiple different consumers of the data are not required to perform the same harmonization work (often with varying degrees of quality). Unfortunately, getting agreement on SDs can take a very long time. It might be as little as a few days for a small, bilateral exchange, but it could also be years for a complex agreement between major organizations.
The diagram below is an attempt to sketch out a path toward more mature practices for data exchange and show some of the benefits gained along the way. Does it make sense to you? And where does your organization fit: are your data exchange practices in harmony, or more of a clanging dissonance? Either way, it would be great to hear about experiences in the world of data exchange from different perspectives, and if/how you think SDMX or other open data standards might help.
While comprehensive multilateral exchange agreements facilitated by SDMX may represent the ultimate in efficiency and data quality standards, a relatively simple data sharing exchange can still deliver benefits without requiring agreement between all parties. All that is required is for participants to adhere to the SDMX technical standard - something that is relatively easy to do with the right tools. Such simple beginnings can also help organizations prepare for exchange agreements to be worked out over time. Small steps or large, it is a journey worth taking.

