The Urgent Need for Better Data Quality in CDPs: How Anomaly Detection Can Help
May 4, 2023Over the past several years, Transparent Partners have helped clients evaluate and select CDPs to meet their needs. We have then gone on to implement and operationalize those CDPs for many of our clients. During this time, we have experienced many issues with Data Quality. The purpose of this blog is to talk about those issues and recommend a new way of tackling them. It is essential to call out that Data Quality issues are not specific to one type of CDP; managing Data Quality is an integral part of managing any CDP implementation.
The lifeblood of any CDP is the data it ingests and acts upon. The data describes the customer, their context, their needs, and their state. Data is the fuel of any marketing campaign and determines the outcome of the customer experience and the campaign’s effectiveness.
Take this simple real-life example of not sending emails to customers who opted in to email. One of the CDPs we manage excluded customers from a campaign because the opt-in flag was being erroneously assigned by the mobile app, even though it had previously worked with no issues.
In another example, I recently observed that my team spent more than eighty percent of their time handling data quality issues when executing a marketing campaign that required a new data source to be connected to the CDP. Did this surprise me? No, it did not. I have seen this happen many times before, and data quality issues are not confined to new data sources; they can frequently occur within existing data sources.
Most customer-facing platforms (e.g. websites and Apps) are in a perpetual change state. New customer experiences (web pages and app screens) are added or removed, and consequently, new data events are captured or eliminated, respectively. In addition, data attributes are added to or removed from these events. While all of these changes are occurring, the CDP continues to ingest real-time events from its established data sources.
Some CDPs offer features or capabilities to help monitor the quality of the data they receive, while others do not. One common feature is a Data Contract. This feature allows a CDP operator to describe the payload of an event it expects to receive. The CDP can then automatically flag when the contract has been breached. For example, when an event no longer sends the marketing opt-in flag, the error is raised, and the CDP operator starts investigating what caused the change in the event structure.
These monitoring features are helpful during a development sprint cycle, such as adding a new data source. However, they lose their potency when a CDP is live, receiving and processing billions of events each month. From our experience, it’s a futile expectation to assume a CDP operator will leverage these features during daily operations for two reasons. Firstly the task is monotonous, and secondly, it gets constantly deprioritized over other more pressing operational tasks.
What are the differences between tools that monitor data quality and those that detect and report anomalies? In our experience, monitoring requires a human to look at and interpret the results proactively. Tools that conduct anomaly detection (AD) call attention to situations considered to be a deviation from the norm. Monitoring through Data Contract, for example, is a pull model; it requires the human to ‘pull’ the information. The AD approach is more effective as it is a ‘push’ model telling the human to interject only when there is something of interest or concern. This push model overcomes the shortcomings I outlined earlier; monotony and deprioritization. It eliminates the monotonous checking of reports and overcomes deprioritization by only calling humans’ attention to crucial issues.
Let me tell you about an instance when we managed the Tealium CDP. Tealium’s CDP includes both data collection and standardization as core capabilities. The industry’s best practice is to create a data layer across all customer data sources specifically for this purpose. Businesses using Google Tag Manager, Adobe Launch, and other data collection tools will be familiar with the concept. A dedicated data layer is https://www.jenkins.io/more stable and less likely to be unintentionally altered or damaged. However, this is not a foolproof solution, and problems with the underlying data layer sometimes occur. Tealium allows businesses to fix most data quality issues in real-time but they must be identified first.
To help identify these data quality issues, we augmented the Tealium CDP with a dedicated AD tool. We chose lightup.ai. It was simple to configure, cost-effective, and quickly championed by our CDP operators. We saw immediate value in the investment; early detection increased business trust and reduced the time to remedy a data quality issue.
While managing the Tealium CDP, we wanted a tool to alert our team if anomalies were detected in the inbound events. We also wanted to know if our audience sizes deviated from the norm. We did this by setting up lightup.ai to detect anomalies on two Tealium database tables. One table captured all the data about the inbound events, and the other contained all the details about the audiences.
In lightup.ai, creating metrics, monitors, and training the models is simple. One of our favorite features is that it will show you what incidents would have been raised if you had the monitor running during a specific date range. This helped us test the efficacy of the model before publishing it live.
Once live, the team no longer had to look for data quality errors. They were alerted to them. The team then had the option to flag whether the alert was reasonable, thus constantly allowing the model to continue to learn and evolve and get more precise on the alerts it created in the future.
Lightup.ai provided considerable benefits to the team. We were alerted to errors as soon as they were detected, rather than being told of them after a business user reported an incorrect audience size. This was beneficial in two ways. First, it reduced the frustration of the business users as we found data issues before them. The second benefit was that earlier detection allowed the issue to be remedied proactively and thus limited its impact on the population of customer data. Ultimately, finding data errors and fixing them earlier had a direct effect on the quality of the customer experiences delivered and the effectiveness of the marketing campaigns.
Lightup.ai is a powerful tool that can be paired with most CDPs. Assuring data quality in a CDP is an ongoing process. It’s a process that has no end. If data quality goes unchecked, the consequences are limited to one’s imagination.
To learn more about how to proactively address data quality issues in CDPs, click here to register or view the recording of our live event in partnership with Lightup.ai and Tealium: https://transparent.partners/the-urgent-need-for-better-data/?utm_source=partner&utm_medium=referral&utm_campaign=cdpi
If you want to discuss more, we are happy to talk with you. Please contact us using this link for a free 30-minute consultation to discuss your situation.
Note
Transparent Partners does not receive any benefits from recommending lightup.ai. Transparent Partners is an independent consultancy specializing in enhancing customer experiences through a marketing lens. Our goal is to identify the unique mix of data, technology, and operational processes that will empower brands to thrive in the ever-changing marketing landscape. The purpose of this blog was to recommend a great tool to help companies battle poor data quality.