Archive for November, 2009

Australian Privacy Awards 2009 - “Hey, that’s just what we do!!”

Friday, November 13th, 2009 by Don McIntosh

Most people have an opinion about privacy these days, from Scott McNealy’s memorable throw away line “You have zero privacy. Get over it”, to the fierce concerns many people have around how much information Google stores about each and every one of us. Well, I certainly feel it’s important and it was great to have the opportunity to meet many other like-minded people at the Australian Privacy Awards dinner last night.

Special Minister of State and Cabinet Secretary Senator Joe Ludwig started the night with a good overview of the state of play, with many people and organisations struggling to come to terms with technology advances such as social networking that have such far-reaching effects on privacy. He mentioned the need for balancing government transparency and protecting personal information so many times that I felt like jumping up and saying “Hey, that’s just what we do!!

It certainly was an honour to receive the “highly commended” award in our category on Space-Time Research’s behalf and I’d like to thank the Office for the Privacy Commissioner for giving us the opportunity to be part of the whole event, and to meet and talk with so many people who work in this area. However, what I really wanted to mention in this post was a couple of award winners that I found particularly interesting.

Dr Roger Clarke was a worthy winner of the Australian Privacy Medal. Dr Clarke used his speech to remind the audience that there was a lot of real work that needed to be done, and that that he felt his medal was a little tarnished, because some people including some of the award winners were essentially just window dressing (not his terms - but I think that was the gist of it), and not really applying a genuine effort to promote privacy. Rooms are typically politely quiet when people give speeches but I think in this case it was a pregnant, slightly awkward kind of quiet.

There’s some great information on Dr Roger Clarke’s website about information privacy. In fact, I came across one note where he mentioned data protection that has made me rethink why we are using this term. He makes a really good point that many laws focus on data protection, where the focus is protecting data about people. As he explains, the real issue is to protect the people and you do that by considering what information might be derived from the data, rather than just protecting the data itself. Very good point.

Another winner I really liked was the Victorian Department of Justice (and not just because they are our customer). Who would have thought promoting privacy practices could be so fun or entertaining? Well, the people at Department of Justice certainly do. As an example, their most recent idea is to put together a radio show based on the X-Files concept. It will be called the P-files, with some really witty variations on Scully and Mulder’s names that have totally slipped my mind. One way or another, they plan to slip in the line “is that a USB stick in your pocket or are you just pleased to see me?” It was really refreshing to hear about their work and I do hope they have inspired many people there to take an equally innovative and enthusiastic response not just to promoting privacy practices, but to many other aspects of their work. I’m sure that even Dr Clarke would agree that they were really deserving winners.

Until 18 mths ago, I’d never heard of the office of the Privacy Commissioner. Now I know a whole community of people who are working to help Australians find the right balance and have some control of what parts of their lives are public knowledge. Privacy may not seem like an important issue to many in this age of Facebook and with the attitudes of Gen Y but I think Roger summed it up very nicely in his speech: “Privacy doesn’t matter until it does.”

Australian Privacy Awards 2009

How Safe Is Your Suburb - Mashup entry

Friday, November 13th, 2009 by Jo Deeker

How Safe Is Your Suburb was an entry in the Mashup Australia contest.

Click here to try How Safe Is Your Suburb

How Safe Is Your Suburb is an easy-to-use interactive web application for the public to gain greater insight into crime statistics in Local Government Areas (suburbs) in New South Wales. The application can be used for informed discussion and policy development by residents, police authorities, and local government. The applicaion shows how statistics can be applied in the everyday life of the community.

How Safe Is Your Suburb embraces the Gov 2.0 philosophy by opening up a static dataset to the public in a useful way.  The user can analyse and play with the data, comment on data, and then share their data with others.

For example, the user can choose different ‘reports’, make selections within each report to compare different types of crime over time, and then see which types of crime are more prevalent in their area.  They can view an interactive thematic map of crime that provides a spatial visualisation of crime types across LGAs for a given year.  They can also identify which suburbs have higher crime rates in total and in per-head of population. (It makes sense that there is more crime in more populous areas).  Users can make comments on each visualisation they are working on.

The application mashes up NSW crime data with LGA boundary files and Census data from ABS.  Space-Time Research has classified each offence into different categories to enable simpler analysis.  More detail could be added to the application at a later date.

The application is built using Space-Time Research’s SuperVIEW product, and is hosted on the Google App Engine. In the spirit of a govhack style competition, our team of three (one database builder, one programmer, one analyst / writer) started working on the application just over 24 hours before it was due.

We would also like to share our experience of mashing up and visualising the data. We have found:

  • There is an unexpected spike in road traffic offences in 2001 and 2002 and then no road traffic offences recorded after that.  This is seen across most LGAs. Only by visualizing the data in a chart did we see the problem, and would suggest that the data quality be checked with NSW Bureau of Crime Statistics and Research before releasing this data. Perhaps the data should be footnoted.
  • We discovered gaps when joining by LGA – our map file, the ABS census data and the NSW Bureau of Crime Statistics and Research data all have slightly different datasets.  We don’t know what year the LGAs in the source data were referenced to, and our application currently joins on LGA name rather than LGA id.
  • We chose to refer to the spatial areas as ‘suburbs’ to make it easier for the general public to relate to. We are aware that LGAs are different from postcode boundaries and that the general public will not be aware of the difference between the two types of geographic boundary. Most members of the public may not know what an LGA is and we have referenced suburbs with LGA in parenthesis throughout the application.

Ideas for enhancing the application include:

  • Enhancing the share functionality by including a share this on twitter, facebook etc application.
  • Expanding the application to allow analysis by individual offence types.
  • Incorporating other ABS census demographic data, such as population count to calculate offences per head of population, and inclusion of employment, education, age breakdown etc. to see if demographics of an LGA impact crime rates.

KML Cruncher - Mashup entry

Thursday, November 12th, 2009 by Andrew Naish

The KML Cruncher was an entry in the Mashup Australia contest.

Click here to try the KML Cruncher

A utility that converts and generalizes ESRI polygon shape files into KML ready for the web. The KML Cruncher might is useful for people who want to quickly move from the shape file format into KML for web mashups.

Using the utility is easy - here’s an example of how to convert an ESRI polygon shape file to a KML file ready for the web:

Step 1 Obtain the shape file you would like to convert and save it to a local drive.

There are many example shape files at http://data.australia.gov.au.

In this example I will use the ‘Drainage Basins Queensland’ dataset available at http://data.australia.gov.au/134. Note, this utility works with polygon shape files only, so ensure you obtain a shape file that contains polygons (also referred to as ‘boundaries’). The ‘Drainage Basins Queensland’ dataset is archived in a .zip file, so make sure you extract it to your local drive before continuing.

Step 2 Now you are ready to convert your shape file.

  1. Click on the Browse button next to the ‘Choose a shape file (*.shp):’ text box.
  2. Locate and select the *.shp file from your local hard drive.

In this example I used the ‘Drainage Basins Queensland’ dataset at http://data.australia.gov.au/134, therefore I will select ‘IQATLAS.QLD_DRNBASIN_100K.shp’ file.

Step 3 Specify the dbf file.

  1. Next to the ‘Choose a dbf file (*.dbf):’ field, click on the Browse button.
  2. Locate and select the associated *.dbf file.

In this example I specified the *.dbf file that is associated with the *.shp file select in step 2, therefore I will select the ‘IQATLAS.QLD_DRNBASIN_100K.dbf’ file.

Step 4 Specify a label field. Note this field is optional.

The label field is used as an identifier for each of your converted polygons – once in KML format this is what will be shown in the information window when you click on a polygon.

This field is optional, if you do not specify it, the utility will take the first field it finds. If you would like to know what fields are available in your .dbf file you can open it using Microsoft Excel, or if you would like to inspect the data further before converting, try ESRI’s ArcExplorer product.

In this example I will set the label field to: BASIN_NAME

Step 5 Specify a generalisation tolerance.

In a nutshell the generalisation tolerance is a measurement between polygon vertices, if this tolerance is exceeded, one of the vertices will be removed. Generally you will need to specify a larger tolerance for more detailed data sets. It is likely that you will have to convert the shape file a few times to get the right tolerance, luckily I have had a bit of time to play with it, so I will specify 0.005 as the tolerance.

Step 6 Convert

  1. Click the convert button.
  2. Wait patiently and you will have a nicely generalised KML file ready to serve on the web!

Also for the developers – this is a simple HTTP post action from a WEB form (nothing fancy) therefore it could easily be used as a web service.

Protecting confidentiality - some real life examples

Sunday, November 1st, 2009 by Don McIntosh

This post blog is on how we are enabling our customers to disseminate detailed information while protecting the privacy of individuals. In the context of being providers of Official statistics, making data more available, and making governments more transparent, we show that it *can* be done - you *can* release data.

We are currently engaging with three customers and developing new requirements around the area of privacy protection on their data. For two of the three, the main goal is to deliver more detailed, useful data to their customers without compromising privacy concerns. The other key goals are around reducing the risk of accidentally releasing sensitive data (a goal of increasing importance given the Gov 2.0 fueled demand for more open data), and reducing costs associated with the application of privacy protection. I thought I’d write a short note to summarise our work in this area of late.

We have an API plugin architecture for applying disclosure control. Basically, you can build your own modules that do things like adjust, conceal, and/or annotate cell values based on certain rules, or reject a query if it’s deemed too sensitive for whatever reason. You can also record query details and use them to monitor for potential privacy intrusions.

The work we are looking at doing in relation to current customer requests includes the following:

  • Implementing plugins with customised rounding and concealment rules. This is straight forward work as far as our current architecture is concerned, and helps our customers with these requirements to implement rules that maximise the data they can make available. For one customer, we have written a plugin that will suppress numbers less than a certain value, and any related totals. So for example, if you were suppressing all numbers in a table less than or equal to 3, a simple table would show suppression of that cell, plus any totals containing that cell. The example table demonstrates how a returned table would look. By suppressing the totals, you are preventing someone from back-calculating a value that has been suppressed.
Suppressed Table

Suppressed Table

  • Allowing custom selection of different rule combinations for testing and more advanced use of disclosure control. This is useful especially where you have a few in-house specialists who are authorised to be more lenient in terms of what rules need to be applied when responding to ad hoc information requests.
  • Extending confidentiality to apply to the output of calculations (SuperSTAR field derivations). For example, you might have a function that in some cases returns “..C” instead of a real value for certain cells as per the example above. Confidentiality can be extended to work with derived data. For example, it would be useful for determining a statistical mean or median and concealing the result if there was less than a certain number of contributors.

We are really keen to hear from our customers and other interested parties. If you have some recent experience in using confidentiality in SuperSTAR or elsewhere, or would like to give us any kind of related feedback, please do feel free to leave a comment or contact us directly.