Gov 2.0 for Providers of Official Statistics

Technical Note

Download the Gov2.0 for Providers of Official Statistics - Technical Note written by Seth Grimes, Alta Plana Corporation.
A4 Letter

 
By Seth Grimes, Sponsored by Space-Time Research

The Importance of Official Statistics

Governments collect, produce, and disseminate a huge volume and variety of data in the course of their operations. While much of this data relates to government administration — to budgeting, planning, and program performance — it is official statistics that most capture our interest. These statistics —demographic, economic, and social information — collectively paint hard-data local and national portraits of people and businesses. They motivate and justify government spending and private-sector investment alike.

Official statistics cover the gamut of societal concerns. Major sectors include agriculture, commerce, crime, education, health, housing, poverty, and transportation. Statistics are collected from censuses, surveys, and routine operational reports. They constitute a form of Public Intelligence, not only about public concerns but also of great interest to the public.

Information dissemination

Given programmatic needs and public interests – and in some cases in response to new mandates such as the U.S. Open Government Directive – government agencies share information internally and, as data providers, disseminate extensively to external stakeholders. They make data available to individuals, academics, businesses, and not-for-profit organizations alike. These stakeholders use Public Intelligence for a spectrum of purposes that range from deciding where to live, where to locate a business, or to fueling social activism and political advocacy.

… and its role in open government

The commoditization of computing power and network access and the rise of the Web have created unprecedented possibilities (and pressures) for participatory, open government, fueled by Public Intelligence. As a result, official statistics have become more important to more people than ever before. That importance only continues to grow, linked to the rise of Government 2.0, citizen-focused, performance-driven, transparent government.

Open government principles dictate that public administrations must do their best to meet the demand for official statistics, and must disseminate them as accurately, quickly, and usably as they can. In keeping with the Web 2.0 principles that have impelled Gov 2.0, administrations must accommodate a very diverse user community with disparate needs as well as an extensive variety of data access and analysis technologies. Meeting these needs, which we explore in this technical note, is what “Gov 2.0 for providers of official statistics” is all about.

The Rise of Government 2.0

Web 2.0 – a focus on access, openness, collaboration, and a rich user experience via self-service capabilities – has provided a mechanism for implementing open government. The sum of principles and mechanism adds up to a new way of governing. It combines data and technology, openness and collaboration: Government 2.0, government recreated.

As observed in an earlier technical note in this series, Making Sense of Gov 2.0,

Gov 2.0’s several principal elements -- policy, people, technology, and mission -- supported by Web 2.0 and modern, data-centric computing practices, will help create the next generation of government.

From a technical perspective, Gov 2.0 is enabled by data-centric processes coupled with analytical technology. Add Web 2.0 imperatives – to Interact, Collaborate, Partner, and Share – to transform these data-centric processes and supporting tools and complete a vision for Government 2.0 per a graphic featured in Making Sense of Gov 2.0.

gov20-for-providers-of-official-stats

A platform informed by Web 2.0 imperatives -- to Interact, Collaborate, Partner, and Share --
transforms data-centric processes to complete a vision for Government 2.0.

Practical Concerns for Official Statistics Providers

Gov 2.0 may seem a complicated undertaking – there certainly is a degree of complexity to the Gov 2.0 platform model presented here – but really rather than a goal, it’s an on-going, evolutionary process. From the top down, it involves an over-arching enterprise architecture and vision as sketched in the previous section. And it involves, from the bottom up, a series of design decisions and practices that cumulatively add up to open government. The key is opening core government processes to public participation and providing mechanisms that enable and encourage the public to participate.

Achieving openness and participation requires more than good will and a declaration of intent. The trick is balancing principles against practical concerns linked to the production, sharing, and dissemination of official statistics. These concerns are not uniformly new to Gov 2.0; whether new or not, they affect mission performance and hence are of critical importance.

Relating to official statistics and Gov 2.0 self-service data access, special, practical concerns include the following:

  • Government data must be accurate, authoritative, and timely.
  • Data must be statistically valid… and should carry a disclaimer regarding validity conditions.
  • Confidentiality must be protected, especially in the face of new technologies that make it easy for external users to join datasets, despite transparency imperatives.
  • Interfaces must be accessible to persons with disabilities.
  • Data must be distributed in machine-readable formats with sufficient metadata to facilitate the data’s use.
  • Provision of “mashable” data that can be easily linked to or combined with disparate, other data is a goal, yet the cost to users, who have paid for the collection and production of the data, must be minimal or, in many cases, nil.
  • Governments must accommodate secondary users and data aggregators who redistribute public data via value-added services and applications.

Lastly, while public users, data access, and applications matter, agencies must prioritize them within the context of their missions, which may place the needs of internal users and projects over the demands and expectations of external stakeholders.

Users, Uses, and Technology

Dissemination of official statistics is in its third generation. The first was distribution of statistical tables on paper, in reports and bound volumes. Distribution in printed form limits the audience and does not facilitate secondary analysis.

The second generation, spanning the 1970s through the mid-2000s, involved data dissemination – by both governments and secondary data providers – on magnetic tapes, diskettes, and disks and via early Website query and download interfaces. Only technically oriented users would undertake any form of substantial data analysis, typically using statistical analysis, business intelligence, and spreadsheet software.

The advent of Web 2.0 and the trend toward open government have pushed dissemination of official statistics into a third generation, where access is primarily online, types of use and users are hugely varied, and new access standards and methods are required to support highly diverse applications.

Data consumers

Where pre-Web 2.0 official-statistics consumers were fairly neatly segmented into four or five categories, Gov 2.0 seems to have more of a continuous spectrum of data stakeholders. The simplest way to describe them is to consider them as ranging from institutions to individuals with secondary providers – commercial data aggregators, university research centers, data archives, and portals – thrown into the mix as multi-role consumer-providers.

Individuals, including most casual users, have traditionally looked for particular, focused data elements: perhaps unemployment statistics over time or a comprehensive statistical profile of their home town. Their data consumption patterns have tended to become much more dynamic, less easily satisfied with pre-built data exploration and display interfaces, instead requiring new abilities to construct custom, hybrid data analysis objects. Their statistical literacy is often lacking.

Institutional users – government internal users, businesses, researchers – have a level of subject-matter and IT expertise higher than that of individual users of official statistics. Their work is project driven, often involving policy and evaluation, supported by commercial data-analysis applications and enterprise-grade IT infrastructure including, increasingly, cloud and software as-a-service computing.

Disparate applications

Pre-Web/Gov 2.0, agencies disseminated official statistics exclusively via their own destination sites. The earliest of these sites launched in the late 1990s, within a few years of the advent of the Web. These sites provided, and provide, non-standardized interfaces – typically systems of hierarchical menus and sometimes search – to allow site visitors to find data of interest. Some sites offer query interfaces while others provide only download of prebuilt data tables. Better sites have display options that include tables, charts, and thematic maps, providing in any case limited ability for users to construct their own geographic-area selections and data objects.

Gov 2.0 approaches enable stakeholders to use official statistics however they please – they support self-service data extraction and analysis – via older-style sites and via adoption of access methods that facilitate ad hoc data extraction and integration. The role of providers shifts to facilitating – but not necessarily having to offer tools for – advanced data integration and analysis activity, whether individual or collaborative.

Modern access methods

While many data consumers, including data aggregators and resellers and hard-core researchers, will continue to use older-style interfaces and dataset download options, the prevailing trend is toward data access via application programming interfaces (APIs), Web services, syndication feeds such as RSS and Atom.

Agencies are expected to variously support or at least facilitate official portals such as the U.S. government’s data.gov site; de facto portals including Google and other search engines, which have loaded government data and deliver statistics in response to data queries; and global, non-commercial initiatives, notably the emerging “web of data” that the Semantic Web has evolved to target.

Support for these access methods entails preparation and release not only of datasets in machine-processable format but also of metadata sufficient to support stand-off identification, extraction, and use of desired information.

Directions Forward

This technical note has described in broad terms the direction forward – toward findability, flexibility, and do-it-yourself analysis of official statistics – but this direction can be nicely illustrated by a number of examples.

Self-service data access and analysis

The CDATA Online and TableBuilder online services, launched by the Australian Bureau of Statistics (ABS) in October 2008 and August 2009, respectively, allow users to dynamically construct ad hoc tabulations, graphs, and maps from 2006 Australian Census data. Comparable capabilities are very rarely offered to end users by most other government statistical agencies worldwide, which disseminate only prepackaged and highly aggregated data. The services rely on Space-Time Research’s SuperSTAR suite, which applies automatic, dynamic confidentialization controls that maintain the general statistical validity of large tabulations. End-users can directly analyze detailed response data that would normally be unavailable.

Another Australian government site, VISTA 07 (Victorian Integrated Survey of Travel and Activity for 2007) from the Victoria State Department of Transport, uses STR’s SuperVIEW software to provide interactive access to tabular or pictorial versions (in maps and charts), of detailed travel information.  SuperVIEW is an interactive publication, exploration and visualization solution that is perfect for provision of data to the public.

Innovative data visualization

Gapminder promotes “sustainable global development and achievement of the United Nations Millennium Development Goals by increased use and understanding of statistics and other information about social, economic and environmental development at local, national and global levels.” This mission should resonate with all official statistics providers, although most, unlike Gapminder, will be government agencies rather than a non-profit venture.

Like SuperVIEW, Gapminder’s visualizations are compelling: data-rich, colorful, but with the addition of animation with user controls. Google acquired GapMinder’s TrendAnalyzer tool in 2007 and subsequently incorporated it into the Google Visualization API.

A gateway to data

The U.S. government’s Data.gov site is a portal that serves as a gateway to hundreds of U.S. federal government data sources. It provides access to data in three ways: via a “raw” data catalog linking to download of machine readable, platform-independent datasets; a tools catalog with links to agency tools or Web pages; and through a geodata catalog with links for geographic information. The initiative is supported by an effort to standardize metadata, formats, and dissemination policies.

No limits: mash-ups

The United States Geological Survey (USGS) has been in the lead in providing real-time Web access to data that can be integrated with other data and with application elements in dynamic mash-ups.  The USGS’s earthquake site shows the variety of access and display methods supported by the agency for international earthquake data – from maps to data feeds to an embeddable Google gadget for Website or desktop display – while sites such as Dawn Endico’s, which maps data via the Google Map API, illustrate how end users can and do make good use of these government services.

The official statistics common ground

These are practical examples that illustrate Gov 2.0 for providers of official statistics, but there are other directions forward. The common ground is sharing government data with the spectrum of stakeholders as a core asset that enables participatory, open government. Achieving this goal is an on-going process. The process involves a balancing act. It aims to facilitate open access and accommodate the widest variety of users, applications, and access methods while attending to official-statistics provider concerns relating to accuracy, timeliness, confidentiality, accessibility and other special needs. Where dissemination is done right, official statistics will serve as a building block for Gov 2.0 and open government initiatives.

About

STR Technical Note series

This is the third is a series of technical notes, sponsored by Space-Time Research, on Self-Service Business Intelligence for Government.

Space-Time Research

Space-Time Research (www.spacetimeresearch.com) provides EASIER, FASTER, SAFER, and cost-effective solutions for government education, welfare, transportation, tourism, health, criminology, and homeland security departments internationally. STR is the vendor of choice for the world’s most advanced National Statistics Offices.

STR creates partnerships with customers and with providers of complementary solutions, technology, and services for customers. Offering speedy and successful implementation, professional services, and support, Space-Time Research is recognized and respected as a global provider of business intelligence solutions.

Seth Grimes, Alta Plana Corporation

Technical Note author Seth Grimes is a business intelligence, data warehousing, and decision systems expert. He founded Washington DC based consultancy Alta Plana Corporation (www.altaplana.com) in 1997. He has over twenty-five years experience designing, developing, and supporting data management and analysis systems for government agencies including the US Navy, Department of Transportation, State Department, Internal Revenue Service, Census Bureau, and NASA, the Organization for Economic Cooperation and Development, and the International Monetary Fund.

Mr. Grimes is also Contributing Editor at IntelligentEnterprise.com, a Business Intelligence Network channel expert, an instructor for The Data Warehousing Institute, and founding chair of the Text Analytics Summit. He writes and speaks on information-systems strategy, data management and analysis systems, industry trends, and emerging analytical technologies. He is also a team recipient of the U.S. Vice President's Hammer Award for Reinventing Government for his work on the U.S. Census Bureau's American FactFinder system.

Download the Gov2.0 for Providers of Official Statistics - Technical Note written by Seth Grimes, Alta Plana Corporation.
A4 Letter

© 2009 Space-Time Research Pty Ltd and Alta Plana Corporation