Posts Tagged ‘SuperSTAR’

SDMX Web Services

Wednesday, June 9th, 2010 by Don McIntosh

Recently, many of us at STR have been working on implementing open data formats, specifically SDMX 2.1 and DDI 3.1. Both are extremely relevant for statistical processing - DDI assumes the key position for planning, data collection, processing and microdata dissemination.  SDMX is most suited for processing and dissemination of aggregated data. Previous blog posts and news items have provided an overview of SDMX to inform our customers about how how SDMX might help them with their own business processes.  This blog post is all about what we are actually delivering with our  mid year SuperSTAR Release 7.0.  The following SDMX functionality will be included:

  1. SDMX output from SuperWEB
  2. Building SDMX-driven SuperVIEW interactive presentations (with no SXV4 db required)
  3. RESTful SDMX Web Services

This blog focuses on the Web Services which is arguably the most important capability.  And perhaps the other reason I’m excited by it is because it is the first time that SDMX has been introduced directly to microdata.  I’ll explain what I mean by this a bit later.

From the point of view of many data providers, the advantage of the Web Services is that it can provide their customers with just the data they need, no more and no less. This can free up staff devoted to responding to ad hoc queries.

From the customer point of view, it opens up new possibilities for consuming the data and building unique, useful services on top of it. For example, a third party application can convert user responses from a Web app into dynamic SDMX queries and then the results from this can in turn be used to determine how the Web app should behave. Without Web Services, such an app would previously have relied on potentially stale data that was downloaded and loaded into a local database. And thanks to the detailed data model of SDMX, apps can also work out what other data sources might sensibly be combined together to produce richer, more useful results.

The other thing I’ll mention before getting into some specifics about what we’ve done is that our implementation is actually that of a RESTful API, not a “traditional” Web Service. We’re glad to see this becoming so much more popular now.  SDMX orginally only had standard SOAP based Web Services defined, but we’ve based our implementation on the proposed RESTful API for SDMX version 2.1.  As developers, a RESTful API is something we find a lot easier to start using, to explore, and to scale and we we think that our customers will find the same.

What we’ve done

The SDMX API that we are focused on can be broken into three logical chunks:

  1. Metadata Discovery - what data collections are available, and what concepts/classifications are used where
  2. Database Metadata Discovery - What metadata (eg: concepts and code lists) are used within a particular SDMX dataset?
  3. Queries - Defining and pulling back a slice of an SDMX data cube

We’ve implemented parts 2 & 3.  (Part 1 we will consider for a future version, but we are also looking at solving this gap in a different way, such as leveraging existing SDMX registries, which are used to collate and manage contents that are stored in SDMX repositories. The important thing to note here is that we don’t want SuperSTAR to be an island - many of the organisations we work with would want to reuse the same search and discovery mechanism across many different types of data and applications, so we’d like to learn more about how SDMX solutions can be part of such an environment before we proceed with this.)

Our SDMX Restful API supports access to aggregated data that is managed by SuperSTAR. This can be from several different sources:

  1. SuperSTAR data cubes
  2. SuperSTAR tables defined by SuperWEB users
  3. SuperSTAR microdata databases

The last case is worth elaborating on, and links back to the point I mentioned earlier about introducing SDMX to microdata. Up until now, SDMX use has been limited to working with pre-aggregated data. This makes sense, especially when you consider the origins of SDMX, which is a group of organizations that deal almost solely with such aggregated statistical data and only rarely with the underlying microdata from which the statistics were derived.

From our point of view, however, and I believe from the point of view of many of our customers, dealing with microdata is very much part of the production process that they are involved in. What is useful about this is that the users are not constrained to taking slices of pre-defined cubes of data, but rather exploring and dynamically defining queries to run against the microdata. This approach can generate orders of magnitude more possible outputs and therefore relieve the provider from the burden of manually addressing many ad hoc queries that can’t be satisfied by a query against an existing cube. It does occasionally introduce other problems, namely confidentiality and performance, but these are part of our core capabilities, so our solution addresses potential drawbacks in this regard.

To make it possible to use an SDMX-based API to run tabulation queries against microdata, we’ve made some necessary innovations to the SDMX standard. Firstly, while you can query for the data structure definition (DSD) of a very large virtual cube (which is actually a SuperSTAR database), we prevent clients from requesting the full dataset for this cube - it’s simply going to be too big. What we do instead is allow for any subset of dimensions in the DSD to be combined in an SDMX query.

In addition, any tables that a user defines in SuperWEB can be accessed as SDMX datasets; both the DSD and the data from such a table can be obtained through queries against the SDMX RESTful API.

If you’ve read this whole post, you must be interested in what we are doing here. We think that the API can be very useful for many of our customers, so please leave a comment here if you have a question or something say. Or if you want to go one step further, let us know and we’ll discuss providing you with a test package that you can use to try the API against your own data.

SuperSTAR Goodies - 6.7 Release progress

Tuesday, October 13th, 2009 by Jo Deeker

We would like to share the progress of some of the good stuff we have been doing in SuperSTAR development towards our 6.7 release.

Since transitioning to a fully agile process, we now run fortnightly iterations. From time to time, we will share the outcomes of an iteration and keep you all up to date.

Some of the key items that came out of this iteration were:

1. Record View in SuperWEB2 - we have implemented our first two user stories:
“As a SW RecordVIEW user, I want a way of seeing all the unit records that relate to a crosstab table so that I can understand the detail behind the crosstabulation”.
“As a SW RecordVIEW user, I want filtered view of the unit records that relate to the cells in a crosstab table I choose so that I can focus on specific areas of interest”

We have implemented RecordView using GWT in the RESTful style. GWT allows us to get a Rich Internet client user experience. Using REST means that it is easy for other clients such as SuperView to consume the RecordView service.

2. Aggregated mapping for SuperWEB2
“As a SW2 user, I want to have a faster mapping experience so that I can be more productive”.

The Mapping team have done some great work to improve the performance of our mapping solution in SuperWEB2. They have developed a ArcGISMap widget which allows SuperWEB2 to communicate directly with the Arc GIS Server via a REST interface. This means much faster zoom and pan performance with maps.

3. SuperCROSS Local Annotations Refactor – we are making good progress to get the Annotations working correctly again in SuperCROSS and are on track with our plans.

4. Automated testing – we have also made good progress in automating the testing of SuperCROSS and SuperWEB2.

If you have any questions regarding our progress on the 6.7 release, or about any SuperSTAR product, please do not hesitate to contact us at support@spacetimeresearch.com

Record VIEW Functionality in SuperWEB2 - comments welcomed

Sunday, October 4th, 2009 by Jo Deeker
Record View

Record View

A guest blog from Don McIntosh, our product manager for SuperSTAR. Please feel free to give us comments or feedback so we can incorporate your feedback into our product development while we are developing it.

What I wanted to cover in this post is a brief summary of what we are planning for RecordVIEW, as well as a few features that might come in a later release. I wanted to write about this now while we are developing it so that our customers and partners have an opportunity to comment and hopefully improve on the end result. Another thing we’ll do is provide a link to a test instance to let you play around with it once we have it up and running.

RecordVIEW is a key feature of SuperWEB - and one that is currently lacking in SuperWEB2. It gives users the ability to drill down into the records that contribute to any cell in a table and view other attributes of those records. We find that customers use it for a variety of reasons. Two of the most common reasons are identification of individuals in interesting sub-populations, and data validation. An example of the former is “give me the list of names of all students scored above 95% in the English test”. An interesting point is that almost all the time, the records extracted via

RecordVIEW need to be subsequently fed into another system for the user to complete their task. That’s a useful one for us to keep in mind, because perhaps we can add much more value by allowing some kind of direct integration between the RecordVIEW action and other systems.

The first step for RecordVIEW is actually to cover off much of the functionality we had in the original SuperWEB. That means identifying some cells, switching to the RecordVIEW tab, choosing what fields to report on, and then downloading to XLS or CSV. The major addition for the first release in comparison to what was in the original SuperWEB will be in the ease of use. The experience will be a lot more immersive, with fewer pauses for server updates and a richer UI. Click on a cell, chose RecordVIEW and then choose what fields to view. You can choose all fields, or start with none and add a select few. You can also sort the results, and selectively filter what fields you’re interested in viewing. One other key feature I’ll mention is that the results of the RecrordVIEW are transparently paginated, so if you have a very long list, the browser isn’t waiting a long time to update it; it simply adds more as you scroll down.

We are of course very aware that for some datasets, RecordVIEW is not appropriate, due to the sensitive nature of the data. We will keep this simple: if there is confidentiality enabled for a database, then no RecordVIEW. Other permission functionality will remain unchanged from the earlier version.

Other key features we will consider later on include cell selection from other views, such as areas on a chart or map. Also, as I mentioned earlier on, we’d like to explore how we can get RecordVIEW output might be more tightly integrated a workflow that involves taking sets of records to feed into another application for further processing, or viewing in a certain way.

It would be interesting to hear about some usage scenarios of feature ideas for RecordVIEW from our customers. We may be able to incorporate some scenarios in our acceptance testing, and hopefully learn about some ways to make this feature smarter and more in line with users’ core needs.

RecordVIEW will be available the Release 6.5 November service pack.

Our Quality Vision (and Addressing Our Quality Past)

Monday, August 24th, 2009 by Jo Deeker

Like all software companies, we at Space-Time Research have juggled customer demands, complex software, very different uses of our software, and ever changing requirements. This has sometimes resulted in us delivering release software to our customers that is not of a sufficient quality, and later than we planned.

In the past, and as recently as our 6.3 release of our software, our testing group has passed a release and the software has been delivered to a customer and then a critical issue has been found. One of the main reasons this happens is that every customer has a slightly different environment. We currently support Solaris, Red Hat Linux, Windows 64 bit, Windows 32 bit, Windows XP and Vista for our client applications, browsers including IE6, IE7, IE8, Chrome, Firefox, Safari. We read data from any relational database that has a jdbc driver including Oracle, SQL Server, DB2 and others, plus different types of text files. We provide mapping with ESRI ArcIMS, ArcGIS Server, Google Maps and soon Bing Maps. We test all these environments and on our servers, our testing can pass.

Then we get out to the customer environment and encounter different environments & constraints. Not everyone can host a Tomcat application and we might have to hook to IIS. Firewalls might be an issue. Ports might be an issue. The client might operate in a remote way. Even if we don’t officially support a configuration, our clients will implement that way anyway and it’s up to us to sort it out.

Once we have the software successfully installed and configured at a client site, they then build some databases and work out how they are going to analyse or visualise their information. Every client has different types of databases, structures and uses of their information. Our testing doesn’t cover every different type of database - we try to, but of course we don’t cover everything. So sometimes we miss things - heirarchical summation options being a recent example.

Finally, our customers use the software with their own workflow. We follow a standard workflow with our automated tests, and then we conduct exploratory testing that mimics what a customer would do, but as we are not the customer, we don’t always get that exactly right either.

So, how do we improve it? What have we done and what are we doing next?

Firstly, for our 6.5 General Availability Release, Space-Time Research defined the following quality vision:

  • Timely, relevant, functioning software that works!
  • Performance, stability and resiliency focus.
  • Deliver releases of SuperSTAR that are perceived within STR and by our partners and customers as better than the previous release.

All decisions about testing, and then which bugs we fix, and when we release our software, are related back to the quality vision.

We implemented a partnership approach with some selected customers to enable them to test pre-release versions of our software. We conducted fortnightly builds, ran a couple of days of testing and then made the builds available to the customer. Builds were provided via FTP site, and customers were able to download the software and install in their own test environments. The customers were able to choose whether they would take a build or not. STR also hosted versions of our web applications so customers could do user interface testing without having to run their own installation and configuration.

The customers reported bugs, severity and their own priority via our normal support channel (via email to support@spacetimeresearch.com). We regularly triaged the bugs reported, and communicated via conference call with each customer to advise what we intended to do, or discuss concerns.

The benefits of this approach were clear for each customer involved:

  • Integration and configuration issues were ironed out during the pre-release phase.
  • Customer-focused testing found issues we would never have found.
  • The end delivery held no surprises.
  • We delivered on time to those customers and met their deadlines.

6.5 General Availability release is almost complete on all platforms. I’ll do another blog and announcement about that separately.

For our next release, we are implementing a fully agile development process. Another blog on that is coming too! But for our customers, please know that we want to:

  • Involve more customers in pre-release testing.
  • Collect more sample databases from customers.
  • Collect reference data sets from customers so we can validate our statistical routines.
  • Use client test beds for complex or unusual environments.
  • Open up our change management and support processes so customers can track issues they are interested in.

Cheerio

Jo