Data Quality

Solution Paper

Read Data Quality Solution Paper to find out:

  • how record checking is used when loading data.
  • about mirco data records and visualizations.

A4 Letter

 
Population census require data quality checks through post enumeration survey. The main data quality areas to be checked are:

  • Coding validity
  • Maro editing
  • Error Analysis.

Coding Validity

Checks to ensure survey responses are coded to valid classifications and investigates the number of non-applicable responses.

Macro Editing

Runs cross tabulations to check the data for logical correctness to ensure results are expected, or if they are not expected, can be explained. To aid in the explanation information workers have direct access to the underlying microdata records from tabulations, which enables them to identify suspect records for further checking.They can also use various visualization options to help detect data anomalies. Techniques such as ColorVIEW provide additional analysis capabilities to highlight possible data anomalies.

Error Analysis

Large surveys such as a Population Census collect a lot of data relating to processing quality and error rates.These statistics can be tabulated to highlight any patterns or trends as the survey results are processed.

The purpose is the identify pockets of disproportionate error and significant variance from overall average accuracy for samples grouped by:

  • Form type
  • Field type, (within single form and across all forms)
  • Geography, (i.e., source of data)
  • Capture method (e.g., OCR, KFI, KFFI)
  • Capture site

These results can then also be used as a basis for causal analysis.

Some of the representative sources of disproportionate error include:

  • Error in form definition
  • Registration errors
  • Location of fields
  • Error in context specification
  • Data typing (i.e., valid values)
  • Dictionaries
  • Cross field consistency checks
  • Confidence tuning
  • Error in keying rules (exceptions to “key what you see”)
  • Local writing styles
  • Configuration management failure (e.g., one capture site/ cluster/ processor not properly updated)
  • Software error affecting only selected fields

Contact us

To find out how our solutions meet your requirements, please contact us.