« Data Quality and UK plc | Main | Garbage Out, Garbage In »

Talking Dirty

Many people (including acknowledged data quality gurus) appear to have a very restricted view of what constitutes "dirty" data and what you can do to improve it.  I was reading an article recently that expounded the case for cleaning-up dirty data, but never ventured beyond tried and tested examples of historical data and alternative versions of names.  In my experience, dirty data can contain a host of hidden knowledge - we need to think beyond the idea of merely cleaning data and understand how we can actually turn "dirty" data into an asset.

For example, take a look at most customer databases and you'll find evidence of how text fields, including names and addresses, are used to store additional pieces of information which are otherwise not catered for.  For instance, call centre staff will often use the customer name to store notes about when to call a customer, how to contact them or even personal comments about them:

  • Mr Andrew Monkton - call after 8pm
  • Ms Fiona Brookes (work - 1-555-2354-321)
  • Mrs Angela Watson - husband is a miserable ****

Most data quality software provides little help in understanding and improving the quality of this data.  What's needed is the ability to profile and analyse the contents of free-text fields beyond a simple count of the number of times each full-field value occurs.  The new generation of data quality solutions provides users with the ability to analyse the contents of text fields and extract valuable knowledge from it.

Organisations that can understand dirty data and extract golden nuggets of information from it have the power to turn what was once viewed purely as a liability into a valuable asset.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83455a8b869e200d83525d8aa53ef

Listed below are links to weblogs that reference Talking Dirty:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Syndicate

RSS Feed


What is RSS?Copyright © 2005-2006
Steve Tuck and

Datanomic Ltd
All Rights Reserved

View Steve Tuck's profile on LinkedIn