Posts Tagged ‘confidentiality’

5 Links about Privacy Protection in Official Statistics

Saturday, January 28th, 2012 by Don McIntosh

dpdengAccording to the official site for Data Privacy Day in the US, it is intended to promote “awareness about the many ways personal information is collected, stored, used, and shared, and education about privacy practices that will enable individuals to protect their personal information.” In the spirit of this, here are a few useful links to help people learn more about protecting privacy in official statistics.

Privacy protection as it relates to official statistics is known as “Statistical Disclosure Control”, or simply “Confidentiality”. It’s all about protecting confidential information about specific individuals while still making sure that we can maximize the usefulness and accessibility of government data.

  1. This post deposes the fairly widespread view that anonymizing data by removing names, addresses and such makes it safe to publish with no threat to people’s privacy.
  2. Our page and link to white paper co-authored with our partner Symbolix about safe dissemination through the use of statistical disclosure control
  3. Confidentiality Information Sheets from the Australian Bureau of Statistics.
  4. A comprehensive set of Government-Created Resources related to privacy put together by the organizers of Data Privacy Day.
  5. The US National Institute of Standards and Technology’s Guide to Protecting the Confidentiality of Personally Identifiable Information (PII) –a fairly chunky PDF but worth a look if you are in the business of making government data available.

Happy Data Privacy Day!

Protecting confidentiality - some real life examples

Sunday, November 1st, 2009 by Don McIntosh

This post blog is on how we are enabling our customers to disseminate detailed information while protecting the privacy of individuals. In the context of being providers of Official statistics, making data more available, and making governments more transparent, we show that it *can* be done - you *can* release data.

We are currently engaging with three customers and developing new requirements around the area of privacy protection on their data. For two of the three, the main goal is to deliver more detailed, useful data to their customers without compromising privacy concerns. The other key goals are around reducing the risk of accidentally releasing sensitive data (a goal of increasing importance given the Gov 2.0 fueled demand for more open data), and reducing costs associated with the application of privacy protection. I thought I’d write a short note to summarise our work in this area of late.

We have an API plugin architecture for applying disclosure control. Basically, you can build your own modules that do things like adjust, conceal, and/or annotate cell values based on certain rules, or reject a query if it’s deemed too sensitive for whatever reason. You can also record query details and use them to monitor for potential privacy intrusions.

The work we are looking at doing in relation to current customer requests includes the following:

  • Implementing plugins with customised rounding and concealment rules. This is straight forward work as far as our current architecture is concerned, and helps our customers with these requirements to implement rules that maximise the data they can make available. For one customer, we have written a plugin that will suppress numbers less than a certain value, and any related totals. So for example, if you were suppressing all numbers in a table less than or equal to 3, a simple table would show suppression of that cell, plus any totals containing that cell. The example table demonstrates how a returned table would look. By suppressing the totals, you are preventing someone from back-calculating a value that has been suppressed.
Suppressed Table

Suppressed Table

  • Allowing custom selection of different rule combinations for testing and more advanced use of disclosure control. This is useful especially where you have a few in-house specialists who are authorised to be more lenient in terms of what rules need to be applied when responding to ad hoc information requests.
  • Extending confidentiality to apply to the output of calculations (SuperSTAR field derivations). For example, you might have a function that in some cases returns “..C” instead of a real value for certain cells as per the example above. Confidentiality can be extended to work with derived data. For example, it would be useful for determining a statistical mean or median and concealing the result if there was less than a certain number of contributors.

We are really keen to hear from our customers and other interested parties. If you have some recent experience in using confidentiality in SuperSTAR or elsewhere, or would like to give us any kind of related feedback, please do feel free to leave a comment or contact us directly.