Learning Center

What is a CDP?

CDP Definition

Customer Data Platform is defined by the CDP Institute as “packaged software that creates a persistent, unified customer database that is accessible to other systems.” Key elements of the definition are:

  • Packaged software.

    The CDP is packaged software, usually bought and controlled by business users, most often in marketing. This distinguishes it from a data warehouse or data lake which is usually custom-built by the corporate IT department. The packaged nature of the system makes it much easier to deploy and change as new needs arise. Corporate IT must cooperate to set up and maintain the CDP but most technical resources are usually provided by the vendor or an agency hired by marketing.

  • Persistent, unified customer database.

    The CDP creates a comprehensive view of each customer by capturing data from multiple systems, linking information related to the same customer, and storing the information to track behavior over time. The CDP contains personal identifiers used to target marketing messages and track individual-level marketing results. CDPs work primarily with data gathered by a company’s own systems about identified individuals. They may also include data from external sources and about anonymous individuals. The CDP is able to retain all details of input data indefinitely, although users may restrict what is stored and how long it is kept.

  • Accessible to other systems.

    Data stored in the CDP can be used by other systems for analysis and to manage customer interactions. The CDP restructures the data, adds calculated values such as trends and model scores, and shares the results in formats that other systems can accept. Access methods typically include APIs, database queries, and file extracts.

These features distinguish CDPs from other systems that work primarily with their own data (such as Customer Relationship Management), store only limited details for limited periods and include large volumes of externally-owned data (Data Management Platform), do not maintain a permanent database (Integration Platform), and interact directly with customers (Email, Mobile App, and Web Content Management).

Other systems may provide similar functions to a CDP. These include data warehouses and software suites or marketing clouds. Often these are limited to structured data or internal inputs.

Types of CDPs

The CDP Institute groups CDP vendors into four categories based on the functions provided by their systems. Each category includes functions provided by the previous categories. There are great variations among vendors within each category. Categories are:

  • Data CDPs.

    These systems gather customer data from source systems, link data to customer identities, and store the results in a database available to external systems. This is the minimum set of functions required to meet the definition of a CDP. In practice, these systems also can extract audience segments and send them to external systems. Systems in this category often employ specialized technologies for data management and access. Some began as tag management or Web analytics systems and retain considerable legacy business in those areas.

  • Analytics CDPs.

    These systems provide data assembly plus analytical applications. The applications always include customer segmentation and sometimes extend to machine learning, predictive modeling, revenue attribution, and journey mapping. These systems often automate the distribution of data to other systems.

  • Campaign CDPs.

    These systems provide data assembly, analytics, and customer treatments. What distinguishes them from segmentation is they can specify different treatments for different individuals within a segment. Treatments may be personalized messages, outbound marketing campaigns, real time interactions, or product or content recommendations. They often include orchestrating customer treatments across channels.

  • Delivery CDPs.

    These systems provide data assembly, analytics, customer treatments, and message delivery. Delivery may be through email, Web site, mobile apps, CRM, advertising, or several of these. Products in this category often started as delivery systems and added CDP functions later.

Glossary of CDP Related Terms

a/b test a testing method that compares results from two or more test groups whose members are similar except for being given different test treatments
adtech any system used to support advertising activities; in particular, systems that work with digital media
agentic AI a type of AI system that can act autonomously to achieve specified goals
algorithmic attribution a type of multi-touch attribution method that allocates fractions of total revenue to different marketing contacts based on statistical analysis of historical data that estimates the impact of each contact
analytics CDP a CDP whose primary functions include assembling, sharing and analyzing unified customer profiles, typically including predictive analytics
anonymization the process of removing all personal identifiers from a data set, making it impossible to connect personal data in the set with the individual who generated it
anonymous individual an individual that is not connected to any personal identifiers which can be linked to a specific person in the real world
Application Program Interface (API) a method for communicating between systems (or between components of the same system) that makes requests (“calls”) for the other system to send data or take an action (cf Webhook)
arbitration the process of selecting which message to send to an individual who is eligible to receive messages from several separate campaigns. Users must specify the selection criteria (highest immediate value, highest renewal rate, etc.). The decision is usually based on a combination of decision rules and predictive models.
artificial intelligence computer processes that mimic human thought processes
attribute data data describing individual characteristics, such as birthdate, address, or education; one person typically has a single value for each attribute and the attributes change infrequently or never
attribution the process of estimating the revenue (or other measure) caused by a particular marketing contact (or other interaction with a customer)
banner ad a type of Web display ad that appears in a box at the top, bottom or side of a page.
batch processing processing a set of data that is accumulated over time and fed into the system at once, such as a file containing all transactions during the previous day. This precludes immediate response to events reflected in the data, such as someone visiting a Web site.
behavioral data data describing individual actions, such as purchases, Web page views, and customer service calls; one person be associated with many behaviors of the same type
big data technology to capture, store, access, and analyze very large data volumes in general, and semi-structured or unstructured data in particular.
buying stage the current relationship of a customer to a specific purchase, where relationships are described as a sequence of states (awareness, interest, selection, purchase, use, replacement). Buying stages mark progress in the buyer journey.
California Consumer Protection Act a California regulation that restricts how personal data is collected and used; it gives individuals rights to reject commercial use of their data
California Privacy Rights Act follow-on to California Consumer Protection Act, expanding consumer rights.
campaign CDP a CDP whose primary functions include assembling, sharing and analyzing unified customer profiles, and selecting personalized messages for individuals
case study a description of how an actual user completed a business task, typically including results.   Used to illustrate the capabilities of a system and the value the system helps to create.
CDP inside a system that provides CDP functionality but whose primary functions include delivery and operational processes
channel preference the likelihood that a customer will engage with messages in a particular channel. Generally based on past behavior. Used to select the most effective channel for each individual.   May vary by message type.
churn the process of ending a customer relationship; a measure of how many customers stop being customers
citizen developer a person who creates software without having acquired conventional programming skills; typically a business user rather than IT professional
cloud data warehouse a data warehouse resident in a cloud database such as Snowflake, Google Big Query, or Amazon Redshift
cloud-based a system deployed on remote servers accessed through the internet and maintained by an external vendor.
cluster analysis any statistical analysis that classifies cases into groups whose members are in some way similar to each other
collaborative filtering a type of predictive modeling that identifies products an individual is likely to purchase, based on past purchases by that individual and by other individuals who have similar purchase histories
commerce media network an advertising network that uses targeting data and/or delivery channels provided by a company that collects them during customer interactions. This includes retailers and other companies such as financial institutions, hospitality provides, and delivery services.
composable CDP an architecture that uses separate components to deliver CDP functionality, often based on a data warehouse built separately
consent management the process of collecting, classifying, retaining, accessing, and updating individual consent for data use under privacy regulations.
consent management system software that manages the consent management process. May be a stand-alone system or part of a larger product such as a CDP.
content management system (CMS) software that manages and deploys formatted information, such as Web pages and documents.
contextual advertising advertising targeted on the basis of context, such as the type of editorial content the advertising accompanies. Requires no information about the individual viewing the ad.
control group a group that is held out from testing to provide a baseline for estimating results that would have occurred without any test
cross device match a match that links two devices to the same individual, based either on deterministic or probabilistic matches
cross-channel marketing a marketing program where the same campaign includes messages in different channels
customer data platform (CDP) packaged software that builds unified, persistent customer profiles accessible to other systems, including primarily first party data and known individuals
customer experience all interactions between a customer and a company, across all stages of the customer relationship.   Includes both prepurchase events (marketing and sales) and post-purchase events (product use, customer service).
customer journey analysis the process of tracking customer interactions leading up to a specified event, such as a purchase, or interactions across their entire relationship with a company. Typically includes identifying more common sequences and differences between the sequences leading to different outcomes or taken by different customer segments.
customer profile (single customer view, 360 degree view) all data associated with a person, collected and organized for easy access
customer relationship management software (CRM) software that stores details of direct interactions between a company’s customers and its sales and service personnel
data activation making use of data; specifically, sharing customer data with systems that will use it for analytics, personalization, or marketing campaigns
data CDP a CDP whose primary functions are limited to assembling and sharing unified customer profiles
data cleansing the process of making data more usable through error correction, standardization, transformations, and other processes. Exact steps will depend on the intended purpose.
data enrichment the process of adding new information to customer data, most often by importing third party data and appending it to existing customer profiles
data governance the process of controlling how data is collected and used in a system, with particular focus on ensuring data quality
data lake a collection of data copied from company systems, stored in its original forms and accessible for analysis and further processing
data management platform (DMP) software that stores anonymous customer profiles, primarily to support Web display advertising
data quality the degree to which data is fit for its intended purpose(s); more broadly, how accurately data reflects the real-world entities it represents
data standardization the process of placing data in a consistent format so that all instances of the same item are the same.   May be done by applying rules (e.g., ‘all phone numbers are divided into country code and domestic number, with no separators’) or reference data (e.g., list of formal first names and variations, all changed to the formal first name; all postal addresses changed to match postal agency standards). Important for accurate matching and reporting.
data transformation the process of converting data from one format to another. Enables disparate data to be combined.
data warehouse a collection of data copied from company systems, reorganized and often summarized for analysis
delivery CDP a CDP whose primary functions include assembling, sharing and analyzing unified customer profiles, and selecting and delivering personalized messages for individuals
demand side platform (DSP) a system used by ad buyers to purchase digital media, typically through automated bidding
derived variables data that is based on other data, usually through calculations such as summary of purchases over time.   Predictive model scores are a sophisticated type of derived variable.
descriptive analytics statistical methods that find patterns and relationships within existing data sets, such as identifying customer segments
deterministic match a match that links two personal identifiers to the same individual, based either on information provided by the individual or by the individual’s actions (e.g., logging into a customer account on a specific device; see ‘identity stitching’).
device ID an identifier linked to a device such as a computer, mobile phone, or smart TV. These may be a permanent attribute of the device itself, such as a serial number, or impermanent because they related to software running on the device, such as a Web browser or operating system
digital asset management (DAM) software that manages and deploys any type of digital content, including documents, videos, sound files, etc.
display advertising Web advertising that appears on Web site or social media pages and is purchased by contract or by bidding on impressions. May be targeted by Web site or by individual.
dynamic content digital content that changes depending on the recipient and other variables, typically achieved by creating a content template that includes rules which select different elements based on data about the recipient and situation (time of day, local weather, product inventory, etc.)
dynamic list a customer list that is automatically updated as customers become qualified or disqualified for the list’s selection criteria. Membership may be adjusted continuously (as new data is received) or updated each time the list is used.
earned media marketing messages that are delivered by unpaid third parties, such as the press. These are often considered to be news rather than advertising.
event triggered campaign a marketing program that is started when a specified event occurs. Typically the program is targeted at individuals and the trigger event initiates the program for a single individual (e.g., an onboarding program triggered when someone becomes a new customer).
feature extraction the process of identifying attributes within unstructured data so these can be treated as structured data.   Typical examples include finding company names within a press release or products within a video. Extracted features are usually applied as tags to the original item.
fingerprinting a technique that uses device attributes such as operating system and build date to identify specific devices, even without a specific device ID. Generally done without user consent and potentially a privacy violation.
first party cookie a Web browser cookie set by the domain of the Web site that sets the cookie
first party data personal data that an organization has acquired directly from an individual
first touch attribution an attribution method that allocates all revenue to the first marketing contact with a customer
fractional attribution a type of multi-touch attribution method that allocates specified fractions of total revenue to different marketing contacts based on when they occurred relative to a purchase (first, middle, last)
fuzzy match a match that links two sets of personal identifiers to the same individual, based on identifiers that are similar but not identical (e.g., two similar postal addresses)
General Data Protection Regulation a European Union regulation that restricts how personal data is collected, used, and protected; it gives individuals rights to consent, review, and demand deletion of personal data
generative AI a type of AI system that can generate text, images, music or other content, based on machine learning models that predict what is most likely to come next in a pattern
geofencing targeting of marketing and advertising messages based on the recipient’s passage into or out of a specific physical location, such as entry to a retail store. Sometimes used in combination with data known about an individual.
geotargeting targeting of marketing and advertising messages based on the recipient’s location, often in combination with other data known about the individual
golden record a record containing the version of each item that is considered the most appropriate for use; this is usually the version judged most accurate and complete. Typically shared with other company systems.
ideal customer profile the set of personal data associated with a company’s best customers. Used to define targets for sales and marketing efforts.
identity graph a set of relations among personal identifiers, indicating how each has been linked to the others and which are linked to the same individual.
identity resolution the process of linking personal identifiers to individual identities, whether known or anonymous
identity stitching the process of connecting a personal identifier to an individual through an intermediary personal identifier (e.g., new device linked to an email address provided by a customer; the device is associated with the customer even though the customer has not herself reported the connection).
incremental attribution a type of multi-touch attribution that estimates the increase in total revenue resulting from a particular type of marketing contact.
individual a distinct person; more formally, an entity linked to at least one personal identifier that can distinguish it from other entities. Identity management systems assign a unique, permanent “master ID” to each individual and then connect all personal identifiers to that master ID.
ingestion the process of gathering data from one system and loading it into another
in-memory data data which is stored in system memory for immediate access. In-memory data is typically discarded after use, although it may be copied to persistent storage first. Some systems keep all data in-memory, to enable high-speed access. This becomes more affordable as memory costs drop, although it is still typically used for relatively small data volumes.
integration platform software that moves data between systems to support processes that span multiple systems, but does not store the data internally
intent data data that indicates how likely a person is to purchase a particular product. Generally based on behaviors such as store visits, social media comments, and consumption of related Web content.
journey orchestration coordinating customer treatments over time and across channels, either to achieve a specific purpose (e.g. a marketing campaign with a defined goal) or throughout a company’s relationship with a customer
key performance indicator (KPI) a measure that correlates with achievement of specific business goals. Separate KPIs are often defined each business project or objective.
known individual an individual connected to at least one personal identifier that can be linked to a specific person in the real world
Large Language Model (LLM) a type of generative AI system that can understand and generate human language, based on massive amounts of text inputs
last touch attribution an attribution method that allocates all revenue to the last marketing contact with a customer
lead to account match the process of connecting individual records to business units associated with those individuals.   Applies to business-to-business data and relates specifically to the data structure of Salesforce.com Sales cloud CRM, which stores people as either “leads” (individuals not connected with an account within a business) or “contacts” (individuals associated with an account).   Most B2B marketing programs expect all individuals to be associated with a business.
life stage the current relationship of a customer to a business, where relationships are described as a sequence of states (prospect, new customer, existing customer, at-risk customer, lapsed customer). Life stages mark progress in the customer journey.
lifetime value the total value generated by a customer throughout their relationship with a company. Often expressed in revenue although profit is more meaningful. May be measured in terms of future value only (e.g. for a new customer), past value only (e.g., to identify most important customers), or total value.   Future values are typically discounted and may be limited to a specific time frame e.g., next five years.
location data data that reports the physical location of an entity over time. Based on latitude and longitude but may also include derived data such as political jurisdiction or aisle within a retail store.   Typically collected by mobile devices and used to target advertising and other marketing messages.
look alike modeling a type of predictive modeling that identifies individuals similar to a company’s current customers, used to select advertising audiences.
machine learning automated processes that build predictive models with little human assistance
marketing automation system software that maintains customer and prospect lists and runs campaigns against them. Primarily used for outbound campaigns (e.g., email) but some systems also support real time interactions (e.g., Web site messages).   Largely limited to data generated within the system itself and to imports from CRM systems.
martech (marketing technology) any system used to support marketing activities; in particular, systems that work with customer-level data
master data management (MDM) software software that reconciles different versions of information about an entity (person, product, location, etc.), selects the version to be used as a standard across company systems, and shares this version (called a “golden record”) with those systems. MDM systems may perform identity matching as part of their function.
Media Mix Model (or Marketing Mix Model) (MMM) an attribution method that uses statistical methods to infer the relation between marketing efforts and business result, using campaign-level data (not individual-level data)
multi-channel marketing a marketing program where separate campaigns run in different channels (email, Web, etc.)
multistep campaign a marketing program including multiple messages over time, typically including the ability to adjust later messages based on each individual’s response to earlier messages
multi-touch attribution an attribution method that allocates fractions of total revenue to different marketing contacts; multiple allocation methods are possible
multivariate analysis any statistical analysis that uses multiple variables as inputs
multivariate testing a testing method that estimates the impact of different combinations of variables on results; can estimate results from combinations that have not actually been tested
natural language processing a branch of artificial intelligence that works with human language, typically to extract features (e.g., people mentioned) or meaning (events described, intent, sentiment, etc.)
next best action the treatment that a business believes will produce the most desirable result for an individual customer; typically based on a combination of rules and predictive analytics; requires specification of the measure that is desired
no-code software software that can be built or configured without using conventional programming skills
NoSQL data store a data store organized not organized into tables, rows, and columns. There are many types, optimized for different purposes.   Generally more flexible than SQL databases because columns are not predefined. Used for structured, semi-structured, and unstructured data.
offline data data collected by physical interaction such as retail purchases, local events, shipments, etc.
omni-channel marketing a marketing program where the same campaign lets customers interact in whichever channels they choose
onboarding broadly, the process of adding people to a system; narrowly, attaching personal identifiers to individual profiles so each customer can be identified and contacted across multiple channels. In particular, it refers to sending offline identifiers (name, postal address, phone number) to third party vendors who match these with online identifiers (email address, device IDs, cookies, etc.)
online data data collected by digital systems include Web, mobile apps, smart TVs, etc.
on-premises system a system deployed on servers controlled by a company. May include “private cloud” deployments as well as deployments in a company’s own data center.
operational CDP a CDP whose primary functions include assembling, sharing and analyzing unified customer profiles; selecting and delivering personalized messages; and operational activities such as order processing or customer support
out of the box data model predefined set of data objects and relationships provided with a system. Typically designed to meet the needs of a specific industry or company type.   Purpose is to save design effort compared with building a custom data model.
owned media marketing messages delivered through a company’s own channels, such as email or Web site
paid media marketing messages that are purchased, such as paid advertising
persistent data data which is stored in a stable format until the user decides to discard it. Actual retention period may be limited by legal requirements.
persistent ID a personal identifier that does not change over time and thus can be used as a permanent “master” ID. It is linked to other personal identifiers which may change (e.g. postal address)
personal data data that is linked to an individual, including attributes and behaviors
personal identifier information that can be used to identify a specific individual, either by itself or in combination with other information
personalization creating communications that are tailored to a specific individual based on data about that individual
personally identifiable information (PII) information that can be used to identify a specific individual; same as personal identifier
predictive AI a type of AI system that uses machine learning to predict future events, based on historical data
predictive analytics/model statistical methods that use data to predict outcomes such as response to promotion or membership in a group
prescriptive analytics statistical methods that use data to recommend decisions such as customer segments to contact or offers to develop
privacy by design a design approach that builds privacy requirements into system planning; this often includes collecting and exposing the least personal data needed to complete a business task
probabilistic match a match that links two personal identifiers to the same individual, based on behavior patterns that suggest but do not prove a relationship (e.g., two devices frequently used in the same places at the same times)
programmatic advertising a type of ad buying based on automated bidding for each impression, typically in real time.   Originally developed for Web display advertising and now applied to other digital media.
prospecting the process of searching for new customers
pseudonymization the process of masking personal identifiers in a data set, so that someone with the right information (such as an encryption key or reference list) could reconnect personal data in the set with the individual who generated it
reactivation campaign a marketing program aimed at convincing a former customer to renew their relationship
real time responding to event so quickly that there is no perceptible delay. Required time depends on the situation: for human interactions it is typically considered one to two seconds. For computer-to-computer interactions such as programmatic ad bidding, it may be less than 1/10th of a second.
real time access receiving a data request from an external system and returning the data to that system in real time
real time decision receiving a decision request from an external system and returning the decision in real time; often includes real time data access, calculations, and rule execution
real time ingestion loading data into a system, completing whatever processing is needed, and making the data available for use in real time
real time interaction exchanging data with a system or person in real time, such that each action takes into account all previous actions including the most recent
RealCDP the CDP Institute’s criteria used to certify that a system provides CDP functionality. Criteria include: load all data types; store all original detail; retain data as long as the user desires; assemble unified customer profiles; make profiles available to other systems.
recommendation engine a system that suggests which product to offer an individual. Selection criteria may differ (highest likelihood to purchase, highest expected value, highest future purchased, etc.). Selection method is usually a combination of business rules and predictive models. Selections are usually based on a combination of individual data (purchase history, behaviors, etc.) and context (inventory, product demand, season, etc.)
regression model a statistical method that finds estimated relationships between multiple inputs and a result and expresses these in a mathematical formula
retail media network an advertising network that uses targeting data and/or delivery channels provided by a retailer
retargeting campaign a marketing program aimed at convincing a customer to purchase a product they had apparently considered buying but did not purchase
search engine optimization the process of maintaining a Web site to achieve the highest possible ranking in Web search engines and thus attract as much organic traffic as possible.
search marketing/paid search Web advertising that appears on search engine pages and is purchased by bidding on keywords.
second party data personal data that an organization has acquired through a direct relationship with the organization that collected it as first party data
segmentation any method that divides an audience into groups of individuals who are in some way similar to each other, typically so they can be treated similarly for marketing purposes
sell side platform (SSP) a system used by ad sellers to offer digital media, typically through automated bidding
semi-structured data data that is presented and stored in a format where the elements and contents are defined together, as in event logs or key:value pairs (e.g., eye color:blue, height:5’2, gender:female)
shopping cart the area of an ecommerce Web site where buyers assemble the set of products they plan to order.   Placing a product in a cart is a high indicator of purchase intent and is often the basis for retargeting an individual with offers for the same product if they do not complete the purchase.
site tag Javascript code embedded in a Web site that collects specified information and sends it to an external destination, such as the site owner for analytics or an ad network to track visits and ad views. Site tags may also place cookies on a Web browser to track return visits.
software development kit (SDK) instructions and tools for building software, and in particular for building connectors between two pieces of software. Often used to enable mobile apps to send data to a customer database.
SQL data store a data store organized into tables with rows and columns, where each row represents a record and each column represents a predefined data element. Used for structured data, primarily to process transactions and store attributes.
static list a customer list that is selected once and not updated or is only updated on demand.
stream test a type of test where customers are divided into groups and each group receives different treatments over time. Used to measure results of fundamental differences, such as different price or service levels, which must be held constant over extended periods to show their results.
streaming data data received in a continuous flow, such as Web site activity or location history
structured data data that is presented and stored in a fixed format where each element is in a specified location, such as the columns of a relational database table or the fields of a data file
tag manager software that manages Web site tags, typically replacing individual tags with a single tag that captures data required by multiple tags and distributes that data to the appropriate destinations.
third party cookie a Web browser cookie set by a different domain other than the Web site that sets the cookie
third party data personal data that an organization has acquired through a marketplace relationship with an organization that acquired it directly or indirectly
tracking pixel an image link embedded in a Web site that calls an external server to return a single pixel. Used to track site visitors. Captures less information than a site tag.
tree analysis a statistical method that classifies cases into groups with different expected results by repeatedly splitting groups of cases into subgroups, using a single variable for each split
unstructured data data that is presented and stored in a format where the elements are not defined, such as a block of text, video, or audio files
use case a description of the steps that an agent takes to complete a business task. Used to illustrate the capabilities a system needs to support a task and to illustrate the tasks a system may support.
warehouse-native CDP an architecture that delivers CDP functionality using an externally-built data warehouse as the primary data store. Often built with components.
Web content management (WCM) software that manages and deploys Web pages and other Web site contents.
webhook a method for communicating between Web systems that sends data to other systems, typically after an event in the originating system (cf. API)

Additional Resources

  • Library

    Review our Library resources, including whitepapers, case studies, and more contributed by the CDP Institute, CDP vendors, and service providers.

  • Find CDP Vendors

    Search the CDP Vendor Directory to find CDPs that meet your goals, scan the Vendor, enter data into the interactive Vendor Recommendation Engine for a more precise list, and read the Vendor Comparison Report for detailed information.