dq:View has moved...

Hi folks,


Thanks to everyone that has been following dq:View, but I've now taken the decision to move the blog to Datanomic's corporate site. I hope you'll continue to follow my witterings at http://www.datanomic.com/category/resources/blog/ or via an appropriate feed; RSS or Atom.

You can also find me online on LinkedIn as http://www.linkedin.com/in/SteveTuck, on Plaxo as http://stevetuck.myplaxo.com and if you're into Twitter, you can follow my Tweets at http://twitter.com/SteveTuck.

All the best,
Steve

p.s. There are already 2 new entries for you to read on data quality related topics at http://www.datanomic.com/category/resources/blog/.

Trillium Software's Identity Crisis

They say that imitation is the highest form of flattery.  I thought someone was pulling my leg when I first heard about this, but it's true - Trillium Software is currently paying for an advertisement on Google, that uses one word only - Datanomic!  Why would such a well established data quality software vendor make such prominent use of a competitor's name?  And why has Trillium singled Datanomic out for this special treatment?  I'll let you make up your own mind about that.  Meanwhile, here's a screenshot I just grabbed that shows the advert.


Trillium_software_datanomic_pretend


Feel free to Google Datanomic and click on Trillium's link - it takes you to the registration page for a White Paper, but if you want the real Datanomic, simply go to www.datanomic.com.  And Kevin, well spotted but no, this doesn't mean that Datanomic has been acquired by Trillium Software!  LOL

The Wrong Guy - identity crisis at the BBC

BBC New 24's coverage of the recent court case between the Beatles' record label, Apple Corps Ltd. and the computer firm, Apple Computer, Inc. was as comprehensive and professional as usual - until they suffered a case of mistaken identify!

Guy_kewney The corporation had invited the journalist and author of newswireless.net to express his views on the verdict, but somehow, while the real Guy Kewney (pictured right) waited in the green room, they managed to get completely the "wrong guy" in front of the cameras - the look of terror on his face when he was introduced was priceless!...

The_wrong_guy_1The_wrong_guy_2_1 But I have to say, the poor chap muddled through the interview in fine style; if you'd like to watch it, here it is!

Perhaps there's a clue to the reason he remained so unflappable in this gentleman's true identity.  At first it was reported that he was a London cabbie, at the BBC perhaps to collect Guy Kewney, but the truth is better than that: it turns out that the man featured in the interview is Guy Goma, a business Studies Graduate from the Congo who was in reception because he was applying for a high-level job at the BBC.  Apparantly Mr. Goma assumed that the whole thing was part of the recruitment process, but was "a little upset" that nobody asked him about his own area of expertise.

And what is Guy Coma's particular area of expertise? I'm pleased to reveal that it's Data Cleansing!  Well Guy, I hope you got the job in the end, I'm sure the BBC could use your experience, but failing that there might be an opening on "Working Lunch"!

dn:Director - a fresh approach to data quality

DndirectorWhy do so many organisations turn a blind-eye to data quality?  One thing for sure is that the legacy data quality software providers have done little to help address this crucial business issue by delivering products that require years of expertise to successfully leverage all of the functionality available (and, just as importantly, to know when to use something else instead).  After a dozen years of working in the field, and having built a highly profitable consultancy business to help clients address this short-fall, I decided a year or so ago to join Datanomic.  I'm delighted to say that, last month, we celebrated the launch of dn:Director, a data quality product that is setting new standards for data quality management in the 21st Century.

I've been privileged to work on data quality projects with many leading, blue-chip companies over the years, but one of the things that struck me was that I was being asked the same questions by clients in 2004 as I was asking myself more than a decade earlier; they were identifying the same old deficiencies in data quality products and having to employ the same workarounds to resolve them.  Sure, the vendors have done something to smarten up the look of their software, but, under the covers sits essentially the same code that was initially developed for mail-room efficiency in the 1980's.

Two more things struck me:

  1. All of the software vendors talked about delivering a tool for "business users" but the reality was that just about every project relied on the IT department to develop the business rules.
  2. Because of the complexity of using the software to good effect, the cost and duration of projects was prohibitive; the reason I was working with so many blue-chip companies was that they were the only ones that could afford to undertake such major projects!

These were the things that motivated me to create Tranato and subsequently to join Datanomic in 2005 and bring together the two technologies under a shared approach.  Put simply, we feel that a data quality product needs to be much more accessible - you shouldn't need to be a software guru to get value from it.

Directorarch_1dn:Director is the result of many years experience in data quality and data management; not just my own, but that of people like Gerry Kelley (Datanomic's VP of Professional Services) and his team, and the shared experiences of our clients and partners.  Taking Datanomic's approach (The Four Cornerstones) and methodology as its foundations, dn:Director has been built from the ground up, using the best-available modern technology.

Developing dn:Director in Java and using standards-based interfaces (such as JDBC, JMS and XML) has enabled us to deliver a technically advanced and extensible data quality product that supports both batch and real-time processes (providing data quality services through SOA).  But the thing that everybody notices first is just how easy it is to use - you should hear what out customers and partners have had to say about it:

"This is great - it's so easy understand and configure business rules"

"I love the way that you can build rules from the data - it's so quick and intuitive"

"This will halve the time it takes to deliver a project"

Directorsample For more information visit Datanomic's website or call on +44 (0)1223 228400.

Note: I know this is very commercial for a blog entry, but given the amount of personal time, energy (and money) I've committed to making dn:Director a success, I hope you'll forgive me.

Talking Dirty

Many people (including acknowledged data quality gurus) appear to have a very restricted view of what constitutes "dirty" data and what you can do to improve it.  I was reading an article recently that expounded the case for cleaning-up dirty data, but never ventured beyond tried and tested examples of historical data and alternative versions of names.  In my experience, dirty data can contain a host of hidden knowledge - we need to think beyond the idea of merely cleaning data and understand how we can actually turn "dirty" data into an asset.

For example, take a look at most customer databases and you'll find evidence of how text fields, including names and addresses, are used to store additional pieces of information which are otherwise not catered for.  For instance, call centre staff will often use the customer name to store notes about when to call a customer, how to contact them or even personal comments about them:

  • Mr Andrew Monkton - call after 8pm
  • Ms Fiona Brookes (work - 1-555-2354-321)
  • Mrs Angela Watson - husband is a miserable ****

Most data quality software provides little help in understanding and improving the quality of this data.  What's needed is the ability to profile and analyse the contents of free-text fields beyond a simple count of the number of times each full-field value occurs.  The new generation of data quality solutions provides users with the ability to analyse the contents of text fields and extract valuable knowledge from it.

Organisations that can understand dirty data and extract golden nuggets of information from it have the power to turn what was once viewed purely as a liability into a valuable asset.

User error

When The Data Warehousing Institute asked in a survey "where does dirty data come from?" the main cause cited was sloppy data entry.  But my experience is that it's sometimes unfair to blame the users; let me give you an example.

I was asked to look at some problem addresses for a UK-based client's data migration project.  The dodgy records were coming from the company's CRM system and the users entering the data were being blamed for the poor quality.  When I looked at the data, I spotted a trend - all of the information was there, just in the wrong order, so I asked to see the data entry screen.

I talked to some of the data entry staff, and watched them enter some new customer records.  Every record they entered looked fine; the addresses on the screen read perfectly.  The problem was the screen layout and the fields that they we putting the address into.

For some reason best known to the CRM system vendor, the address was represented as low-level elements, which appeared on the screen in a 2-column tabular format.  The data entry staff have no idea what a dependant thoroughfare or a double dependent locality are, so they simply entered the address as they would expect to see it on an envelope, using the fields in the left-hand column.

The problem was compounded by the fact that the fields weren't in the order that they occur in a correctly formatted address.  During the migration, the addresses were rebuilt, but this time they followed the Royal Mail's standards, in short the address was put back together in a different order.

So who should we blame for these data quality issues?  Should we put it down to "user error" or should be look to the people responsible for the poorly thought through, and over-engineered CRM system?

Syndicate

RSS Feed


What is RSS?Copyright © 2005-2006
Steve Tuck and

Datanomic Ltd
All Rights Reserved

View Steve Tuck's profile on LinkedIn