CDP Vendor Classification: One More Draft

January 17, 2018

Remember the CDP classification project that’s been percolating since last September?  Last week I circulated another draft approach to the CDP Institute Board of Advisors for comment.  As always, their feedback was immensely helpful.  In particular, they’ve convinced me that we can’t entirely separate “What makes a CDP?” from “How do CDPs differ from each other?”   Answering that second question remains the primary goal for this project.  But the answers to the first question are needed to put the second question in context.  My solution is to present both sets of answers but to separate them so people can focus each as needed.

Stepping back a bit: this project began because I heard frequent complaints that about the wide variation among systems classified as Customer Data Platforms.  This was causing confusion in the marketplace and leading to questions about whether CDP is a meaningful category.

My first line of defense was to clarify the CDP definition itself, for example in this Library paper and, more visually, this blog post.    But that doesn’t help users who need to a rapid way to narrow their choices among 50+ CDP vendors.  So the next step was a project to clarify which CDPs did what.

The result of this project must be easily understood by people who just starting to explore buying a CDP.  These are likely to be marketers or marketing technologists who know the problems they face but are unfamiliar with the details of CDP systems.  This argues for starting with use cases, which potential buyers should understand, and relating these to CDP features, which potential buyers won’t know much about.

But most CDPs support nearly all use cases to some degree, so that discussion immediately gets lost in nuances that will only confuse buyers looking for clear direction.  Detailed feature checklists would be even more baffling to people who are just starting out.  My solution in the past has been narrative descriptions of system features, which can capture some of complexity that checklists mask.  But narrative descriptions also rely heavily on readers knowledge of they imply.

Our recent CDP Industry Update pointed towards a solution.  Looking for an easily understood way to present information from that report, I found it made sense to group vendors into three broad categories: systems that only build the unified customer database; systems that offer the database plus analytics (especially list building, a.k.a. segmentation); and systems that include the database, analytics, and “customer engagement”, which really means message selection.  Those are fairly easy distinctions that any marketer or marketing technologist should understand regardless of what they know about CDPs.  Equally important, most buyers should have some idea of which they want their system to do, so this sort of classification should meet our primary goal of helping them decide which products to examine more closely.

This insight – that the classification scheme should be based on what marketers already know about their needs – lends itself to a further subdivision based on channels supported, since that’s another thing marketers know from the start..  As it happens, CDPs vary considerably in which channels they support – and unlike generic use cases, support for a particular channel can easily be tied to specific system features.

The combination of broad functions plus channels supported should allow marketers with no knowledge of CDP technology to quickly identify which CDP vendors might suit their needs.  Assigning the vendors to these categories is fairly easy since the decisions can be based on a few key features that can are relatively easy to assess.  This won’t capture the subtle differences between systems but that’s something we leave to the marketers themselves.  Indeed, it’s something we want marketers to do because only they can determine which of the hundreds of fine differences matter in their situation.   And even if we did build that list, it would soon become out of date.

This line of reasoning led to the feature list I sent to the Advisors.  Their comments led me to add a few features that distinguish CDPs from other types of systems.  By definition, these apply to all CDPs, so they don’t help with choosing one CDP over another.  But they do help marketers understand why they’d choose a CDP over a non-CDP.  A list of shared features also helps clarify that the other listed items are NOT essential CDP requirements.  The Advisors’ comments showed that’s wasn’t necessarily clear.

In short, we’ve ended up with two classes of CDP features:

  • Shared CDP features, which are present in all CDP systems but not necessarily in other types of systems.  So they constitute both a baseline for understanding what a CDP can do and a benchmark for comparing CDPs with alternatives.  All are related to data management.
  • Differentiating CDP features, which are present in only some CDP systems.  This includes three subclasses of data management, analytics, and customer engagement.

It’s important to stress that even though all CDPs build a database, the features listed under “data management” are not required to do that.  Rather, they are advanced features that support specific use cases.  Channel-specific features illustrate this clearly: every CDP can load Web data by importing a flat file but not every CDP provides cookie management.

It’s also worth noting that the channel-specific features all relate to data management, not to analytics or engagement.  Some CDPs also provide channel-specific engagement features, such as injecting personalized messages into a Web site.  That is the sort of detail we’ve decided not to include.  What we do capture is which systems have special features to manage data from a given channel and which do some type of engagement management.   If a system manages data from a particular channel and does engagement, there’s a good chance it does engagement in that channel.  You can be even more confident of the opposite: a system that doesn’t manage data from a particular channel almost certainly won’t do engagement in that channel.   That’s all we need to know to quickly eliminate systems that don’t match the user’s needs.   Some of the remaining systems will later be eliminated after closer examination, but that’s okay: our job is only to reduce the number of systems that must be examined closely.

We’ll get to the actual feature list shortly.  But first I want to state again that this is not intended to be a comprehensive list of CDP features.  In fact, the goal is the exact opposite: to find the minimum list of features that can distinguish CDP vendors in each category.  This simplifies matters for buyers, vendors, and whoever is assembling the data.  The lack of detail also forces buyers to look more deeply at the individual products to tell them apart.  That’s important because buyers too often rank products based on feature lists without bothering to understand which features are important in their particular situation.

Let me also clarify that one CDP can belong to several categories.  A CDP could have analysis or engagement features without doing advanced data management features.  Or it could do advanced data management in some channels but not others.  The intent is for buyers to decide which of the features they need and find vendors that have them.

Without further ado, here is the list.  It’s not yet cast in concrete, so please comment publicly or privately if you see something you think should change..

Shared CDP Features

Every CDP can do all of these.  Non-CDPs may or may not.

  • Retain original detail.  The system stores data with all the detail provided when it was loaded.  This means all details associated with purchase transactions, promotion history, Web browsing logs, changes to personal data, etc.  Inputs might be physically reformatted when they’re loaded into the CDP but can be reconstructed if needed.
  • Persistent data.  The system retains the input data as long as the customer chooses.  (This is implied by the previous item but is listed separately to simplify comparison with non-CDP systems.)
  • Individual detail.  The system can access all detailed data associated with each person.  (This is also implied by the first item but is a critical difference from systems that only store and access segment tags on customer records.)
  • Vendor-neutral access.  All stored data can be exposed to any external system, not only components of the vendor’s own suite.  Exposing particular items might require some set-up and access is not necessarily a real time query.
  • Manage Personally Identifiable Information (PII).  The system manages Personally Identifiable Information such as name, address, email, and phone number.  PII is subject to privacy and security regulations that vary based on data type, location, permissions, and other factors.

Differentiating CDP Features

A CDP doesn’t have to do any of these, although many do one and some do all.  These are divided into three subclasses: data management, analytics, and customer engagement.

Data Management.

These are features that gather, assemble, and expose the CDP data.

Base Features.

These apply to all types of data.

  • API/query access.  External systems can access CDP data via an API or standard query language such as SQL.  It’s just barely acceptable for a CDP to not offer this function and instead provide access through data extracts.  But API or query access is much preferred and usually available.  API or query access often requires some intermediate configuration, reformatting, or indexing to expose items within the CDP’s primarily data store.  Those are important details that buyers must explore separately.
  • Persistent ID.  The system assigns each person an internal identifier and maintains it over time despite changes or multiple versions of other identifiers, such as email address or phone number.  This allows the CDP to maintain individual history over time, even when source systems might discard old identifiers.   CDPs that use a persistent ID applied outside of the system do not meet this requirement.
  • Deterministic match (a.k.a. “identity stitching”).  The system can store multiple identifiers known to belong to the same person and link them to a shared ID (usually the persistent ID).   This enables the system to connect identifiers indirectly: for example, if an email linked to an account is opened on a particular device, subsequent activity on that device can also be linked to the account.
  • Probabilistic match (a.k.a. “cross device match”).  The system can apply statistical methods and rules to identify multiple devices used by the same person, such as computers, tablets, smart phones, and home appliances.  While many CDPs rely on third party services for this sort of matching, this item refers only to matching done by the CDP itself.

Schema-Free Data Management.

This refers to loading data without defining its contents in advance.  This greatly reduces the effort to add a new data sources or new data types within an existing source.  It is most relevant when dealing with unstructured or semi-structured sources such as Web logs, social media comments, voice, video, or mages.  It can also load data from structured sources such as transaction systems.   Semi-structured and unstructured data are typically managed with “big data” technologies such as Hadoop.  Nearly all CDPs use some version of this technology but it’s only essential if clients have unstructured or semi-structured sources and/or very high data volumes.  Some CDPs handle very high data volumes in structured databases such as Amazon Redshift.

  • JSON load.  The system can accept and store data through JSON feeds without the user specifying in advance the specific attributes that will be included.  Additional configuration may later be required to access this data.  There are some alternatives to JSON that offer similar capabilities.
  • Schema-free data store.  The system uses a data store that does not require advance specification of the elements to be stored.  Examples include Hadoop, Cassanda, MongoDB, and Neo4J.

Web Site

This refers to interactions with the company’s own Web site, whether on a desktop computer or mobile device.

  • Javascript tag.  The system provides a Javascript tag that can be loaded into the client’s Web site and used to capture data about customer behaviors.  Some CDP vendors provide full tag management systems but this is not a requirement for this item.  This item does require that data captured by the Javascript tag can be associated with a customer record in the CDP database.  This is usually done with a Web tracking cookie but sometimes through other methods.
  • Cookie management.  The system can deploy and maintain Web browser cookies associated with the client’s own Web site.  The cookies can be linked to customer records in the CDP database.

Mobile Apps

This refers to interactions with mobile apps created by the company.

  • SDK load.  The system offers a Software Development Kit (SDK) that can load data from a mobile app into the CDP database.  It must be able to associate the data with individual customers in the CDP database.  This is usually done through an app ID.  Other SDK features such as message delivery are not a requirement for this item.

Display Ads

This refers to interactions through display advertising networks, including social media networks.

  • Audience API.  The system has an API that can send customer lists from the CDP to systems that will use them as advertising audiences.  The receiving systems might be Data Management Platforms, Demand Side Platforms, advertising exchanges, social media publishers, or others.  Ability to receive information back from the advertising systems is not a requirement for this item.
  • Cookie synch.  The CDP can match its own cookie IDs with third party cookie IDs to allow the marketer to enrich profiles with external data or reach users through advertising networks.


This refers to interactions managed through offline sources such as direct mail and retail stores, where the customer’s primary identifier is name and postal address.

  • Postal Address.  The system can clean, standardize, verify, and otherwise work with postal addresses.  This processing is reduces inconsistencies and makes matching more effective.   Systems meet this requirement so long as the address processing is built into system process flows, even if they rely on third party software.  Systems that send records to external systems in a batch process do not meet this requirement.
  • Name/Address Match.  The system can find matches between different postal name/address records despite variations in spelling, missing data elements, and similar differences.    As with postal processing, systems can meet this requirement with third party matching software so long as the software is embedded in their processing flows.

Business to Business

This refers to companies that sell to other businesses rather than to consumers.

  • Account-level data.  The system can maintain separate customer records for accounts (i.e., businesses) and for individuals within those accounts.  This means account information is stored and updated separately from individual information.  It also means that selections, campaigns, reports, analyses, and other system activities can combine data from both levels.
  • Lead to Account Match.  The system can determine which individuals should be associated with which account records, using information such as company name, address, email domain, and telephone number.  This excludes processing done by sending batch files to external vendors.


These are applications that use the CDP data but don’t extend to selecting messages, which is the province of customer engagement.

  • Segmentation.  The system lets non-technical users define customer segments and automatically send segment member information to external systems on a user-defined schedule.  Ideally, all data would be available to use in the segment definitions and to include in the extract files.  In practice, some configuration may be needed to expose particular elements.  Systems meet this requirement regardless of whether segments are defined manually or discovered by automated processes such as cluster analysis.
  • Incremental attribution.  The system has algorithms to estimate the incremental impact of different marketing activities on specified outcomes such as a purchase or conversion.  Attribution is a specialized analytical process that relies on the unified customer data assembled by the CDP.  Algorithms vary greatly.  To qualify for this item, the algorithm must estimate the contribution of different marketing contacts on the final result.  That is, fixed approaches such as “first touch” or “U-shaped distribution” are not included.
  • Automated predictive.  The system can generate, deploy, and refresh predictive models without involvement of a technical user such as a data scientist or statistician.  This usually employs some form of machine learning.   There are many different types of automated predictive; systems meet this requirement if they have any of them.


This refers to applications that select messages for individual customers.  It does not include content delivery, which is typically handled outside of the CDP.

  • Content selection.  The system can select appropriate marketing or editorial content for individual customers in the current situation, based on the data it stores about them, other information, and user instructions.  The instructions may employ fixed rules, predictive models, or a combination.   Selections may be made as part of a batch process.
  • Multi-step campaigns.  The system can select a series of marketing messages for individual customers over time, based on data and user instructions.  The message sequence is defined in advance but may change or be terminated depending on customer behaviors as the sequence is executed.
  • Real-time interactions.  The system can select appropriate marketing or editorial content for individual customers during a real-time interaction.  This requires accepting input about the customer from a customer-facing system, finding that customer’s data within the CDP, selecting appropriate content, and sending the results back to the customer-facing system for delivery.  The results might include the actual message or instructions that enable the customer-facing system to generate the message.

There you have it: 26 relatively simple items that I think offer a meaningful way to differentiate among CDP systems.  What do you think?