Articles

Articles

The cost and potential of data quality

The cost and potential of data quality

Most companies have an ambition to utilize AI and run a data-driven business...

But what happens when the data quality is insufficient, missing - or maybe even incorrect?


Bad decisions happen leading to risky misconceptions and costly business. Very costly.

Still, data quality is undervalued, overlooked, or framed as a one-time product solution implementation fixed by the data team.

But how did we end up here?
The answer is a complex combination. A combination of culture, cost, lack of business ownership, lower immediate impact, fragmented architecture, limited understanding, and lack of management priority.

This also means, that addressing and changing these challenges requires a shift in mindset, recognizing that data quality is fundamental to accurate decision-making and business success.

Realizing the cost of bad data and the barrier bad data creates for analytics and AI could be some of the drivers re-inventing the focus on data quality.


The 10x rule

More than 30 years ago George Labovitz and Yu Chang did a study on the negative impact poor data could have on business. Their study proposed the 10x rule underlining the importance of investing in data quality and error prevention early in the data lifecycle to mitigate the significant costs and consequences that can arise from data errors propagating through various stages of processing and usage. In simpler terms the study concluded:

"The longer a company takes to fix data quality issues - the larger the losses will be in the business".


The scale and consequence of poor data quality was described as the 10x rule or the 1-10-100 principle: 


  • 1x cost to prevent a data error
    This refers to the cost incurred to prevent an error at the data entry stage. It is the least expensive stage to address an issue by ensuring accuracy during data input.


  • 10x cost to correct a data error
    If the data error is not caught at the data entry phase and proceeds to the next stage (like storage or transformation), the cost to remedy it increases substantially as it requires more resources, time, and effort to locate, correct, and reconcile the error at this stage.


  • 100x cost if the error remains unfixed and reaches the end-user or customer
    If an error persists through the earlier stages and reaches the end-user, consumer or client, the cost of rectification skyrockets. This final stage includes ramifications like customer dissatisfaction, potential legal implications, lost business opportunities, wrongful decisions, or sub-conclusion - and the substantial effort required to remedy the error's impact.


While the amounts of data and the technological landscape has change significantly - the main point of the study remains: Bad data is expensive! 

The idea behind the 1-10-100 rule is not necessarily to label specific data with exact cost, but to emphasize the importance and effect of monitoring and addressing data quality issues early in the data lifecycle, as the as the cost of bad data increases with a snowballing effect as the issue moves  downstream. 

Three decades ago this was the estimated effect proposed in the study. Today, with the amounts of data generated being exponential while data increasingly being the central pulse in most companies it would be fair to assume that the cost and overall potential effect is much higher varying from industry to industry. Especially with more regulatory requirements applied and most processes from invoicing to delivery being digital running on data.


You don't know, what you don't know
Outside the cost and effects mentioned above poor data quality has another overlooked effect: Being a barrier for analytics and AI.
Developing and utilizing the opportunities with analytics and AI simply require solid data quality. Many companies seem to forget that. The old saying with "garbage in, garbage out" still apply.  Good data quality It is the fuel of AI.

Recently, we have seen the challenges with poor data quality in two different use-cases of AI. In both use-cases the challenge was descripted well from a business perspective, the scope and value potential was clear, data was identified, AI capabilities in place and a data platform with appropriate tools was available - and data quality killed the party. The model(s) were simply not able to deliver. In one example the lack of data quality even meant that the model could not separate and distinct between very basic pieces of information.
It sounds simple and some of it is. However, it is still a critical barriers for analytics and AI that many companies discover as a barrier the hard way. And that also holds a high implicit cost.


Shared ownership and priorities
Moving in a direction with more AI and more analytics, data quality will need to prioritized. To save cost and avoid wrong decisions, but also to prepare for innovation, new insights and new revenue streams.

To do so companies need to understand that high data quality is a shared responsibility between business domains and the data departments.

In most companies data is produced, born or collected in the business. Either in business systems or applications. Data is consumed and used by the business – and the business is, most often, the ones feeling the pain or upside of data quality.

In short, data quality is very important to the business and the business should always participated in the joined responsibility in order to succeed. Data departments can support and monitor with solid data quality principles, but it is a joined responsibility of everyone involved. End-users across business domains as well as management needs to understand that. It is a cross-functional exercise of change management explaining the path from why to how.
Meanwhile, the 10x rule can be used to create a business case of poor data quality to management.

This way, by prioritizing and investing in data quality management at the point of entry, companies can save costs, improve decision-making and remove potential barriers for new innovations with AI.

Codellent Aps

Rahbeks Alle 21

1801 Frederiksberg C

DK - 43115235


info@codellent.com

© Codellent 2024. All rights reserved