Rounding Methods for Online Data Dissemination

Introduction to statistical disclosure control

Statistical Disclosure Control (SDC) plays a fundamental role in protecting sensitive data whilst minimising information loss. As a field of research, it has existed for decades.

The current trend towards interactive, user-driven, ad hoc queries and the rise of the prosumer market (Ellenberger and Muir 2009) provides additional challenges for the data provider.

Data providers, including National Statistical Offices (NSOs), are still faced with the challenge of minimizing disclosure risk and maximising data useability. However, there is now commonly the additional requirement for techniques that can be applied successfully to ad hoc table generation (for example, online).

We consider some common approaches to disclosure control on both frequency (count) and magnitude data tables. SDC is more usually discussed in terms of census data than survey (which already introduces a degree of protection through survey  participation/membership and weightings) but can be applicable to both.

SDC Methods at a glance

In this section we provide an at-a-glance comparison of some common methods, such as Rkey perturbation, rounding, and suppression. This section is suitable for managers and decision makers, to provide a quick overview of the methods and their applicability.

Classes of method

SDC methods fall under two broad types: Safe Setting or Safe Data. Safe data falls under the following classes (STR 2009):

  • Pre-aggregation – where cells’ classifiers are aggregated (recoded) to avoid cross-tabulations that contain small cell values. This can be effective but it is difficult to pre-empt queries. Also, the rise of the prosumer market leads to more people wanting ad hoc access to fine (if not micro) level data.
  • Restricted data access – through security measures or application of business rules to restrict or disallow certain queries.
  • Cell Suppression / Concealment – where sensitive cells and related cell values are simply removed from the report. This can lead to ‘swiss-cheese’ tables and can pose difficulties for analysts. This method is not appropriate for already sparse data.
  • Cell adjustment / Obfuscation – where sensitive cells, or a whole table, are varied slightly. This includes controlled tabular adjustment (CTA), Rkey perturbation (developed by the Australian Bureau of Statistics) and other rounding techniques.

This paper focuses upon the Obfuscation techniques for Data Control. These techniques are better suited to modern dissemination philosophies and concepts of Gov and Web 2.0 transparency.

Read the Rounding Methods for Online Data Dissemination paper A4 Letter to learn more about each method, who uses it, and when to apply it.

References and further reading

Causey, B, L Cox, and L Ernst. 1985. Applications of transportation theory to statistical problems. Journal of Americal Statistical Association 80:903-909.

Cox, L. 1987. An constructuve procedure for unbiased controlled rounding. Journal of Americal Statistical Association 82:520-524.

Cox, L, and L Ernst. 1982. Controlled rounding. INFOR 20:423-432.

Ellenberger, John, and Stuart Muir. 2009. National Statistical Offices and the Prosumer Challenge. In NTTS 2009 Conference Proceedings.

Fraser, B, and J Wooton. 2005. A proposed method for confidentialising tabular output to protect against differencing.  Monographs of official statistics.

Leaver, Victoria. 2009. Implementing a method for automatically protecting user-defined census tables. In Joint UNECE/Eurostat work session on statistical data confidentiality. Spain.

Lowthian, Phillip, and Giovanni Merola. 2004. The application of controlled rounding for tabular data with particular referenceto the Tau-Argus software. In Methods for Statistics for UK Contries and Regions Conference 2004.

Massell, PB. 2003. Statistical disclosure control for tables: Determining which method to use. In Statistics Canada Conf. on Statistical Methodology, Oct.

Salazar-Gonzalez, J, and C Bycroft. 2006. The controlled rounding implementation. In Monographs of official statistics.

Salazar-Gonzalez, Juan-Jose, Phillip Lowthian, Caroline Young, Giovanni Merola, Stephen Bond, and David Brown. 2004. Getting the best results in controlledrounding with the least effort. In Privacy in statistical databases: CASC Project final conference, PSD 2004 edited by J. Domingo-Ferrer and V. Torra. Spain: Spinger.

Salazar-Gonzalez, Juan-Jose, and Markus Schoch. 2004. A new tool for applying controlled roundung to a statistical table in  Microsoft Excel. In Privacy in statistical databases: CASC Project final conference, PSD 2004, edited by J. Domingo-Ferrer and V. Torra. Spain: Spinger.

Sande, G. 2003. A Less Intrusive Variant on Cell Suppression to Protect the Confidentiality of Business …. rapport non publié.

Shlomo, N. 2007. Statistical disclosure control methods for census frequency tables. International Statistical Review 75 (2):199-217.

Staggemeier, Andrea Toniolo, Phillip Lowthian, and Grant Lee. 2007. Applying Tau-Argus to SuperCROSS tables: a practical example using the UK business register unit data. In Joint UNECE/Eurostat work session on statistical data confidentiality.

STR. 2009. Privacy Protection - Disclosure Control and Confidentiality. In White Paper.

Willenborg, Leon, and Ton de Waal. 1996. Statistical disclosure control in practice . Vol. 111, Lecture Notes in Statistics. New York: Springer-Verlag.