TMW #135 | Towards warehouse-native identity

Jul 23, 2023

Welcome to The Martech Weekly, where every week I review some of the most interesting ideas, research, and latest news. I look to where the industry is going and what you should be paying attention to.

👋 Get TMW every Sunday

TMW is the fastest and easiest way to stay ahead of the Martech industry.  Sign up to get the full version delivered every Sunday for this and every TMW, along with an invite to the TMW community. Learn more here.


I can guarantee you that whenever a technology category starts to evolve in an industry, there’s always one thing that happens: It creates more problems.

Like every other technology shift, if it’s the rise of the customer data platform (CDP), or the advent of the cookie, or the invention of email marketing platforms, there’s going to be more problems, and often there are more problems than solutions. Such is life.

One interesting shift in Martech is the growing category of composable Martech – particularly the composable CDP – and greater investment into warehouse-native marketing technologies. Sometimes these things are called data activation tools, or Reverse ETL. But the shift is clear; the next wave of marketing tech will have to play nice with the data warehouse.

Arpit Choudhury from Databeats, says this shift has been on a relatively early growth curve over the past three years, with marketing and data teams just starting to see the benefits:

"What has really changed in the last 3 years, thanks to the availability of better data tooling and the adoption of the cloud data warehouse, is that the possibilities of activating data have increased manifold. But what I really want to highlight is this evolution isn’t just great for GTM people, it’s equally great for data people – they are now able to work closely with their GTM counterparts to build the most efficient experiences powered by the freshest data, thereby making measurable contributions to business growth."

The companies operating in the composable space are seeking to solve the problems introduced by the marketing cloud vendors and CDPs; namely vendor lock-in, data siloing, and unnecessary constraints to the use and management of data. And while there might be some progress in that space, it does introduce new problems, namely: if you’re not going to use a CDP, how on Earth do you do customer identity management?

New solutions, new problems

Right now, the current situation of a lot of companies that have purchased CDPs looks something like this:

As you can see, there are two silos, the CDP, and the data warehouse. Because CDPs are built for getting data to end-Martech and Adtech tools like email marketing platforms, ad networks to run targeted ads, or website personalization, there’s a real need to make sure you can attach the right data to each customer profile.

CDPs have been able to solve this business problem to enable a lot of new marketing, customer experience, and advertising use cases. And I’ve seen in my own practice that it leads to tangible marketing outcomes.  

But the big glaring problem with CDPs is that in their aim to centralize, organize and enrich customer data, what ends up happening is that it can quickly become a second source of truth, increasingly siloing the marketing department from data engineering and IT that manages the data warehouse, or on-premises services that also collect and manage customer data. When it comes to identity management, CDPs can be inflexible, opaque and limiting.

In response to these issues, in recent years, companies like Hightouch, Census, RudderStack and MessageGears challenged the entire conception of why we need CDPs in the first place. The idea of composability is this shift towards greater data flexibility and scale:

“Most marketers look at composability and think it’s about building your own technology, API layer, or platform. But I don’t think this captures the spirit of composability; companies often build their own internal monoliths that don’t integrate with anything and become a huge blocker to progress and we don’t call that “composable Martech”. Composability does not equal building custom tech.”

Instead, composability has this organizing idea in it that in order to be flexible enough to cope with technological change, privacy regulations and changing consumer expectations from online channels, Martech should be malleable. Martech needs to become more compatible with other solutions to fit into a proper definition of composability.”

With the growth of the data warehouse in enterprise brands, where to store digital data is now a solved problem. The data cloud is becoming the dominant way to store and manage all kinds of business data, turning into a huge business and a critical piece of corporate infrastructure:

The shift to the warehouse is closely followed by the categories of reverse ETL and composable CDP. This is one of the faster-growing categories in the Martech space, attracting quite a bit of venture funding over the past couple of years; recent examples being the announced funding this week for Hightouch at $38 million for a total of $90 million, and a whole range of new ideas like StatSig’s warehouse native experimentation platform.

So it seems like there’s traction in the data activation space on top of the warehouse, and there’s a good reason for this – why should companies have to pay for and effectively manage two data warehouses when you can extend the capabilities of the warehouse with Martech-specific tools? This is the premise and the value proposition for the entire composability shift in Martech: Less copying and pasting data to our systems, and more flexibility, but at the cost of needing greater technological sophistication.

Is identity a real problem?

The concept of identity resolution usually focuses on a few key ideas. Most conversations on the feasibility and commercial value of identity resolution technologies are centered around the idea of the single or 360 customer view – both marketing terms pushed by vendors or consultants alike.

Both Forrester and Gartner have more or less condemned the whole concept of the value and ability of brands to build single views of customers. And part of the reason is because of the increasing evidence from both firms that companies are not getting enough value out of the huge amount of work needed to get data, processes, and technologies aligned to allow for anything close to a single view of customer.

I think the real situation is that offering a single view of the customer is perhaps a solution looking for a problem. Most brands have no real use for a full view of the customer; it does look quite nice in a dashboard though.

It makes more sense to have clear use cases such as joining identities and making those identities available so teams can attach characteristics to them when needed. For example, suppressing customers from ads or direct marketing when they have an open complaint is one of the classic use cases where having the identity stitched across a couple of systems makes a lot of sense.

A lot of the customer 360 value proposition revolves around the customer service realm of business. Knowing a customer’s order history is one thing, but knowing what personalization test groups were in, the last time they signed into an app, or what ad sets they’re in is not as helpful as most think.

And some of the reasons why we might need a single or 360 view of the customer is spurious. For example, this interesting report from RIS suggests that the perception of value between retailers and consumers differs drastically when it comes to personalization, something that requires a lot of investment into identity resolution. When asked what consumers want, most of them said they just wanted more discounts.

And while this is just one sector of the industry, there are plenty of counter-examples to this. It does raise an interesting question – perhaps we’ve been placing too much value on things like personalization and the tech that powers it than we should?

But there is a good rationale for identity stitching outside of this argument. It can improve customer analytics and insights, better help in customer support situations, and enhances targeting, and it can also play an important role in coordinating cross-departmental use cases and strategies.

With this backdrop, it’s clear that the 360 view is becoming a thing of the past, but it doesn’t mean we should throw the baby out with the bath water. When it comes to identity, it’s actually about wanting two different things; a single view is about control and visibility, whereas a warehouse-native identity is about flexibility and scale.

When the customer’s profile is unified in the data warehouse, then it makes all of the other transactional, operational and behavioral data more accessible for when a customer needs it. I just don’t think we should confuse one with the other. Warehouse-native identity is a technological shift, not a conceptual one.

Three paths to identity

One of these shifts is using the warehouse for all customer data management, with tools to activate data, run analytics and manage storage. Today, it looks like this:

But there’s something missing. That’s right. Identity resolution! Where should it live?

Should it live here in the warehouse? This is something that your IT department might commonly call the “Golden Customer Record.” And there are plenty of great examples where companies will build their own private CDP within the data warehouse environment.

For that situation, it would look something like this:

Customer identity within the warehouse seems to be one of the more interesting options.

But the advantages are clear: compute and API costs are increasingly expensive, and having identity solutions within the data warehouse could reduce a lot of friction and challenges around storing customer data, along with having more security control over how data is shared. If you’re going to create a “Golden Customer Record,” then it might as well be fully managed in the data warehouse.

But the problem this has (like many other CDPs) is that this locks brands into a specific methodology for joining and managing identifiers. If your customer attributes – the literal lifeline to communicating with customers and growing your brand – are rented inside Snowflake or GCP, then that’s one major point of failure.

The other problem is that for all the valuable use cases in identity, you need to bring in data that is not normally stored in the data warehouse like website and app events, email marketing engagement, and even advertising metrics. To make identity worth something, you need actionable data that will lead to an outcome.

This is reflected in what I jokingly now call “the octopus” which highlights that the warehouse really needs integration not only to send data to downstream Martech, but also to feedback data from the apps that create it.

So pursuing identity within the warehouse means a lot of work to bring these other sources of data in too. That’s why some CDPs have an advantage – they can bring in customer data from a variety of places to create valuable use cases right off the bat.

I think this is one of the reasons why data warehouse companies haven’t gotten further than some skirmishes around the edges in the identity resolution space. Data warehouses originally existed to store data – not produce it. A customer identifier – say, a unique ID – is something that’s created as a new thing to store. So a shift from doing identity within the data warehouse is a new concept to grapple with.

One example of this is solutions that live outside of the warehouse to enrich customer data inside of it. This is how DBT works as a service that integrates with the data warehouse to model identity as it writes back to tables and schemas in the warehouse.

The last option that both Hightouch and RudderStack have just recently announced is a solution to build customer identity resolution as part of their data activation suite of tools, which looks something like this:

Hightouch calls this a major leap forward for managing customer data, saying that the opportunity is to align activation needs with identity needs on top of the warehouse. In their release of identity resolution, they also repackaged a lot of tools to the Customer 360 toolkit, which is a fairly dramatic shift out of the data activation and reverse ETL category:

“Customer 360 Toolkit, tackles the data readiness problem head-on, allowing our customers to radically improve the data residing directly in their data warehouse. These capabilities enable companies to define, organize, and link customer and entity records to create a comprehensive, “360-degree” view of their many customers.”

RudderStack describes their approach to identity resolution like this:

“Instead of writing mountains of SQL or dealing with the limits and data silos of costly SaaS tools, Profiles allows your data team to specify important customer traits, then runs the joins and computations automatically, producing an identity graph, user features, and full customer 360 table in your warehouse.”

And Twilio-Segment, a more traditional CDP, is part of Databrick’s Delta Sharing program that allows CDPs to read and action data from the warehouse without copying it to each system. This will overlap with identity eventually as CDPs look for ways to maintain relevancy in an increasingly modern data environment.

One of the interesting advantages here is the coupling. Identity is valuable on two edges: insight and activation. Tying customer identity resolution with data activation will help to make the use of profiles more practical while retaining all of the flexibility, possibilities, and scale by writing identity back into the warehouse. In this way, it’s almost the best of both worlds. Almost.

There are no winners here

So which option is going to be the mainstream way of doing identity resolution in the cloud? There is likely never going to be an answer. In the widely varying levels of competence, and diversity in what companies need from customer data, every setup will be different. In the case of warehouse-native identity, the choice is a value add.

But getting it right matters. Over the timeline of data warehouse-native Martech, figuring out how to manage identity is the final boss. And whoever solves it will be able to accelerate the off-ramp from siloed, packaged CDPs to a future in customer data architecture that’s more flexible, dynamic, and able to work with a constantly changing digital economy.

Right now, identity in the warehouse is not yet a solved problem. But just like the promises of a 360 and single view of the customer, we’d be better off treading lightly unless tech companies wind up with a handful of solutions for problems that don’t exist.


Stay Curious,

Make sense of marketing technology.

Sign up now to get TMW delivered to your inbox every Sunday evening plus an invite to the slack community.


Want to share something interesting or be featured in The Martech Weekly? Drop me a line at juan@themartechweekly.com.

Juan Mendoza

Juan Mendoza is an expert in researching global media, marketing, data, and technology trends. He is the CEO of The Martech Weekly, a media and research brand with subscribers in over 65 countries.

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.