Data Standardisation & Cleansing
It all starts at Data download, during the download process we inventory all datasets and attributes. Key attributes like all name and address fields are mapped and the data standardisation process begins. Other client custom fields can be specified and included at this stage.
File types and data formats can vary hugely, we can accept, convert and read most files, as long as you can export it, we can audit and inventory. Flat file, database format we can help advise on data supply, one merged file or many to 100’s of files its all the same.
Probably one of the most critical processes is to map and link data sets where required. Accuracy in field mapping is crucial to subsequent processing, this is where we can handle names and addresses across multiple files in multiple formats.
Every dataset we load as standard runs through a Contentious Character check. This highlights and flags all records in a dataset where there appears to be foreign or rogue characters. These can be corrected and standardised or flagged for client review.
Pre-parsing data where needed at download can be leveraged to help standardise and prepare data for later processing and coding logic. We have standard exclusion rules and routines we screen and report on for all clients. This can be cutomised with client specific rules and routines.
One compound name field or five separate fields, all variations of name fields are loaded and the clever name bursting engine goes to work. Creating an additional cleansed and standardised set of name fields with flags on poor quality names or records with suspect names.
Similar to mapping name fields, all variations of address fields across all files and formats are loaded and a new set of pre-parsed address fields are created so no client data fields are ever changed. Flagging poor quality and suspect addresses for further enhancement and by Royal Mail Postal Address File standardisation.
Phone numbers, custom codes, donation prompts, segment codes. regardless of field, they are all loaded and can be standardised and enhanced where needed as part of the data project.
Once data download has completed we have a comprehensive download report for each individual file, showing overall data quality, suspect and poor quality records. these stats and reports then shape the data cleansing routines and how each dataset and suspect records are handled.