Archive for the ‘Privacy protection’ Category

Embracing Advanced Visualization - apps4NSW Comp entries

Friday, March 26th, 2010 by Jo Deeker

Space-Time Research have developed two entries for the apps4NSW competition (for New South Wales, Australia) using SuperVIEW.  The apps4NSW competition, like the Mashup Australia and Apps For Democracy competitions, invited the public to submit ideas and applications that would benefit the citizens of New South Wales.

I’m excited about our two applications because they are genuinely useful online interactive publications of complex data that everyone will benefit from.  Our Why Australians Travel application presents a dataset from Tourism Research Australia that has not been made available to the public in an interactive way before.  It also includes advanced visualization in the form of a Motion Chart (Gapminder-style) which we’re very excited by! The motion chart can tell a story with data over time that you simply don’t see in static tables or reports.

The How Safe Is Your Suburb 2.0 application provides NSW Crime data in an interactive way, allowing users to analyse relative crime rates ot absolute crime rates by suburb.  This application is supported by one of our newest features - metadata -where explanations about the data are provided to the user to help them understand the meaning of the data.

Go check our applications out and vote for us if you like them!  And if you have any feedback on our entries please don’t hesitate to make a comment on our blog here.

Gov 2.0 Radio Interview: The Future of Privacy

Thursday, March 18th, 2010 by Jo Deeker

Don McIntosh was recently a guest on Gov 2.0 Radio discussing the future of Privacy and how it relates to data.

Said Don:
“Many people, especially Gen Y, have the view that privacy is not an issue for them and to quote Eric Schmidt, ‘If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.’ I much prefer the view of Bruce Schneier, who is pretty much the world’s leading expert in information security, who points out in an excellent essay very clearly that people espousing that view ‘… accept the premise that privacy is about hiding a wrong. It’s not. Privacy is an inherent human right, and a requirement for maintaining the human condition with dignity and respect.’”

Click here to listen to the podcast.

Introducing SuperVIEW Collaboration

Wednesday, February 3rd, 2010 by Jo Deeker

SuperVIEW is our solution for Interactive Publication, Exploration & Visualization of Public Data. Our latest version has a new collaboration feature that we want to share with you.

Using our new SuperVIEW Collaboration features, you can make comments or invite others to make comments on your visualizations using Google Friend Connect.  You can also share your customized visualisation with others using our new Share feature. The Share feature allows you to embed a link to your view in a website, blog, Facebook, Twitter or your other favorite social networking application.

Recently Craig Thomler, a well-known active participant and leader in the Australian Gov2.0 movement, wrote a blog post on the new data.gov.uk site which he considers is the world leader in open data websites.  He then goes on to make a wishlist of what we could do in Australia to the data.australia.gov.au site to make it the best in the world.  Some of what he is asking is for is delivered by SuperVIEW right now including the ability for people to embed visualizations into their own sites, and to allow every set of data to support a discussion to allow people to ask questions to clarify what the dataset contains and discuss how it could be presented in a more usable way.

View this video to see SuperVIEW Collaboration in action.

If you have any questions about SuperVIEW please contact  jo.deeker@spacetimeresearch.com

Do government agencies know enough about the limits of anonymization?

Monday, January 18th, 2010 by Don McIntosh

There is a new wave of open government data scheduled to crash over the US on January 22 resulting from the government’s Open Government Directive. Is the government paying enough attention to data privacy issues that this deluge could trigger, and how aware are agencies of the well-established fact that anonymizing data is often an inadequate means of protecting privacy in public sector information, and that in many cases more “scrubbing” of the data is needed before any part of it can be safely released for public use?

Until recently, many government agencies have not been motivated to provide data transparency. Compared with the work that directly aligns with their mission and funding being a visionary supporter of the principles of transparent government is not really high on the agenda. In fact, in many cases, the message from up high hasn’t really reached them at all (one senior US government official’s take on Gov 2.0 was “oh, that’s a subset of Web 2.0 isn’t it?”). If you add to this reluctance the quite significant disincentives such as the risks of being too transparent, inadvertent privacy breaches, and plain and simple costs, then it’s not surprising that the average department hasn’t been as enthusiastic as the Gov 2.0 activist community might like them to be. And if the ROI on the whole deal is often external, why bother?

Well, there’s nothing like a directive straight from the top to get things moving. As of December 8, U.S. federal agencies had 45 days to get three “high-value datasets” published online and available through data.gov. Wow! Having worked with national statistics agencies for many years, I have some grasp of how long they typically take to publish data and it’s often longer than this, especially when you are dealing with data that has not previously been published. Of course, the data in some cases might be basic lists of non-sensitive material, in which case perhaps it is not too much extra work to make it suitable for public access. What I’m interested in examining is what it will take for agencies that don’t have it that easy, who will need to derive statistics from their data, or reduce it in some way to make it “safe” for public consumption.

Firstly, why bother publishing statistics if the raw data is available? Isn’t the open data community interested in getting “raw data now”, so that it’s quick for the agency and promises maximum flexibility for users? The reality in many cases — and one that seems to still be ignored by some who work in Information Management — is that even after you “de-identify” data by stripping obviously identifying attributes from it such as names, addresses, SSNs, etc, it does not necessarily protect privacy. It can still be a fairly trivial exercise for an ill-meaning data analyst, or even a non-technical person in many cases, to re-identify many of the people in the list. That is why in many cases we’ll see statistics being released about the data, rather than the raw data itself.

Associate Professor of Law Paul Ohm from the University of Colorado released a paper about the “Surprising Failure of Anonymization” last year, citing some prominent cases where anonymized data was re-identified and pointing out that there are many laws and regulations that are based on the false assumption of anonymization being a panacea for data privacy protection. In one example he describes, a researcher demonstrated how 87.1% of people in the U.S. were uniquely identified by their combined ZIP code, birth date, and sex. He also covers the AOL search data scandal, where individuals were identified from vast volumes of data by their unique search habits, uncovering some embarrassing personal information along the way.

While the individual agencies may not all have a clear understanding of all the potential privacy issues related to open data, at least the federal administration does have a focus on this. The directive itself states that data can only be made available “subject to valid privacy, confidentiality ….. restrictions”. In addition, the “Concept of Operations” paper for data.gov does have privacy in its sights, stating that there will be working groups looking into privacy issues arising from how data is mashed up and/or used in applications. I would point out that these groups could make an early head start simply by reading Paul Ohm’s paper, and not wait until after this round of data has been released. It seems that for the moment at least, the idea of what constitutes adequate privacy protection for open data is really up to each agency to decide.

While the working groups deliberate how privacy issues that result from data mashups and the like should be addressed, many datasets will be posted to data.gov and despite the proven limits of the effectiveness of anonymization, the experience that my colleagues and I have gained from talking with people who work in Information Management in government is that key staff in at least some agencies are not sufficiently aware of this, and that in their view, anonymization is essentially all you need to do to make data safe for release. I’d be interested to know if this agrees with others’ observations.

My observation regarding government’s understanding of data privacy issues is based largely on anecdotal evidence collected by myself and my colleagues. Perhaps I am overstating things and agencies do have the required skills and knowledge to release data safely. It would be good to hear about how different agencies are dealing with the Open Data Directive and what you think about the challenges of releasing useful data without unduly compromising privacy.

Note: Ohm’s paper is fairly lengthy. For a very interesting summary of the paper, you can check out this post on ars technica, which sparked a lot of debate regarding the importance of privacy.

Australian Privacy Awards 2009 - “Hey, that’s just what we do!!”

Friday, November 13th, 2009 by Don McIntosh

Most people have an opinion about privacy these days, from Scott McNealy’s memorable throw away line “You have zero privacy. Get over it”, to the fierce concerns many people have around how much information Google stores about each and every one of us. Well, I certainly feel it’s important and it was great to have the opportunity to meet many other like-minded people at the Australian Privacy Awards dinner last night.

Special Minister of State and Cabinet Secretary Senator Joe Ludwig started the night with a good overview of the state of play, with many people and organisations struggling to come to terms with technology advances such as social networking that have such far-reaching effects on privacy. He mentioned the need for balancing government transparency and protecting personal information so many times that I felt like jumping up and saying “Hey, that’s just what we do!!

It certainly was an honour to receive the “highly commended” award in our category on Space-Time Research’s behalf and I’d like to thank the Office for the Privacy Commissioner for giving us the opportunity to be part of the whole event, and to meet and talk with so many people who work in this area. However, what I really wanted to mention in this post was a couple of award winners that I found particularly interesting.

Dr Roger Clarke was a worthy winner of the Australian Privacy Medal. Dr Clarke used his speech to remind the audience that there was a lot of real work that needed to be done, and that that he felt his medal was a little tarnished, because some people including some of the award winners were essentially just window dressing (not his terms - but I think that was the gist of it), and not really applying a genuine effort to promote privacy. Rooms are typically politely quiet when people give speeches but I think in this case it was a pregnant, slightly awkward kind of quiet.

There’s some great information on Dr Roger Clarke’s website about information privacy. In fact, I came across one note where he mentioned data protection that has made me rethink why we are using this term. He makes a really good point that many laws focus on data protection, where the focus is protecting data about people. As he explains, the real issue is to protect the people and you do that by considering what information might be derived from the data, rather than just protecting the data itself. Very good point.

Another winner I really liked was the Victorian Department of Justice (and not just because they are our customer). Who would have thought promoting privacy practices could be so fun or entertaining? Well, the people at Department of Justice certainly do. As an example, their most recent idea is to put together a radio show based on the X-Files concept. It will be called the P-files, with some really witty variations on Scully and Mulder’s names that have totally slipped my mind. One way or another, they plan to slip in the line “is that a USB stick in your pocket or are you just pleased to see me?” It was really refreshing to hear about their work and I do hope they have inspired many people there to take an equally innovative and enthusiastic response not just to promoting privacy practices, but to many other aspects of their work. I’m sure that even Dr Clarke would agree that they were really deserving winners.

Until 18 mths ago, I’d never heard of the office of the Privacy Commissioner. Now I know a whole community of people who are working to help Australians find the right balance and have some control of what parts of their lives are public knowledge. Privacy may not seem like an important issue to many in this age of Facebook and with the attitudes of Gen Y but I think Roger summed it up very nicely in his speech: “Privacy doesn’t matter until it does.”

Australian Privacy Awards 2009

Protecting confidentiality - some real life examples

Sunday, November 1st, 2009 by Don McIntosh

This post blog is on how we are enabling our customers to disseminate detailed information while protecting the privacy of individuals. In the context of being providers of Official statistics, making data more available, and making governments more transparent, we show that it *can* be done - you *can* release data.

We are currently engaging with three customers and developing new requirements around the area of privacy protection on their data. For two of the three, the main goal is to deliver more detailed, useful data to their customers without compromising privacy concerns. The other key goals are around reducing the risk of accidentally releasing sensitive data (a goal of increasing importance given the Gov 2.0 fueled demand for more open data), and reducing costs associated with the application of privacy protection. I thought I’d write a short note to summarise our work in this area of late.

We have an API plugin architecture for applying disclosure control. Basically, you can build your own modules that do things like adjust, conceal, and/or annotate cell values based on certain rules, or reject a query if it’s deemed too sensitive for whatever reason. You can also record query details and use them to monitor for potential privacy intrusions.

The work we are looking at doing in relation to current customer requests includes the following:

  • Implementing plugins with customised rounding and concealment rules. This is straight forward work as far as our current architecture is concerned, and helps our customers with these requirements to implement rules that maximise the data they can make available. For one customer, we have written a plugin that will suppress numbers less than a certain value, and any related totals. So for example, if you were suppressing all numbers in a table less than or equal to 3, a simple table would show suppression of that cell, plus any totals containing that cell. The example table demonstrates how a returned table would look. By suppressing the totals, you are preventing someone from back-calculating a value that has been suppressed.
Suppressed Table

Suppressed Table

  • Allowing custom selection of different rule combinations for testing and more advanced use of disclosure control. This is useful especially where you have a few in-house specialists who are authorised to be more lenient in terms of what rules need to be applied when responding to ad hoc information requests.
  • Extending confidentiality to apply to the output of calculations (SuperSTAR field derivations). For example, you might have a function that in some cases returns “..C” instead of a real value for certain cells as per the example above. Confidentiality can be extended to work with derived data. For example, it would be useful for determining a statistical mean or median and concealing the result if there was less than a certain number of contributors.

We are really keen to hear from our customers and other interested parties. If you have some recent experience in using confidentiality in SuperSTAR or elsewhere, or would like to give us any kind of related feedback, please do feel free to leave a comment or contact us directly.