CDP Differentiators: Identified vs. Anonymous Data

December 27, 2016

Many systems assemble customer data. This is confusing for marketers who often have a hard time understanding how the systems differ. Here’s a look at one important distinction: whether a system works with identified or anonymous individuals.

Identified individuals are known by name or an identifier that can be linked to a name, such as phone number, email address, credit card, or bank account. Anonymous individuals can’t be linked to a name but may still have an identifier that can be tracked over time, such as a browser cookie, device ID or account log-in. This means that even anonymous individuals can have a customer profile with detailed information.  But the anonymous profile is typically limited to one source: without a personal identifier, there’s no way to link it to data in different systems that’s about same person.

In recent years, the most important anonymous identifiers have been Web browser cookies. These are deposited on a computer during a Web or email interaction and can be read during subsequent interactions. If the visitor identifies herself by filling out a form or logging into an existing account, the cookie can be linked to her identity. But most site visitors don’t identify themselves, so their cookies remain anonymous.

The primary use of anonymous cookies has been to create advertising audiences. These are built by linking each cookie to attributes derived from Web behaviors such as content consumption. Audiences are built by selecting cookies with specified combinations of attributes. Technical details differ, but you can safely visualize this data as a spreadsheet where the first column holds the cookie ID and other columns contain values for attributes. Key design challenges in building these systems include handling millions (sometimes billions) of cookies, allowing thousands of attributes, easily adding new attributes, and selecting records very quickly.

Identified data is a different story. Because the data can be linked to a specific individual, the system often includes data from different sources. Each source may have its own identifier: a cookie ID from a Web site, an email address from CRM, a device ID from a mobile app, and so on. To accommodate this, the system needs a central “spreadsheet” that lists all the identifiers associated with a particular individual.  Other “spreadsheets” hold the actual data from the source systems: that is, Web page views, purchases, phone calls, emails sent, etc. Each row on the central spreadsheet represents one individual and the columns are the different identifiers. On the other spreadsheets, each row represents a particular item (page view, order, phone call, etc.) and the columns are the details about those items (date, page name, product purchased, price, etc.). The columns are different in each spreadsheet, reflecting attributes of the items they represent. But every spreadsheet needs a column with a customer identifier.  This is what the system matches to the identifiers in the central spreadsheet when it needs to create a unified customer view by assembling all data associated with an individual.

The actual technologies involved with anonymous and identified data involve more than simple spreadsheets.  But you can still safely assume that systems for identified data are more complex than systems for anonymous data.  Challenges facing systems for identified individuals are different as well. Key identified data issues include storing and accessing different data types, making it easy to add new sources, and combining similar data that comes from separate systems.

The different requirements for anonymous and identified data mean it’s hard for one system to do both well. Some vendors don’t even try. Others use a single technology they feel works well enough.  But most who try to do both run what are essentially two different systems, each optimized for one application. They then call on the appropriate system as needed. These vendors differ in the underlying technologies, how much data is actually shared by the two systems, how data in one system can be accessed through the other (if at all), and how the data is presented to administrators, users, and other systems.  Many vendors also improve performance using supplemental technologies such as indexes and summary tables.

This practical complexity is what muddles the distinction between Customer Data Platforms (CDPs) and Data Management Platforms (DMPs). Most DMPS were originally designed to handle anonymous cookie pools for advertising, using some variant of the “single spreadsheet” model. Most CDPs were designed to manage identified data from multiple sources, using the “multi-spreadsheet” approach. But many DMPs have been extended to handle identified data and many CDPs have added support for advertising audiences. The technical details of how they do this vary widely, but it’s those technical details that determine how well they succeed. So there’s no value to generalizing about which solution is theoretically better.

What does have value is being a smart buyer. This means you should:

  • recognize that managing anonymous and identified data are fundamentally different applications
  • look closely at how any system that does both actually works, keeping an eye out for different internal mechanisms
  • test handling of each data type separately. A system that manages one data type well doesn’t necessary do an equally good job with the other.