TMW #083 | The bundled warehouse

May 22, 2022

Welcome to The Martech Weekly, where every week I review some of the most interesting ideas, research, and latest news. I look to where the industry is going and what you should be paying attention to.

👋 Get TMW every Sunday

TMW is the fastest and easiest way to stay ahead of the Martech industry.  Sign up to get the full version delivered every Sunday for this and every TMW, along with an invite to the TMW community. Learn more here.


Here’s the week in Martech:

  • The bundled warehouse: CDPs and the role of centralized data
  • AdTech, RTB, and data breaches. Yet another scandal in plain sight
  • The COVID-19 half-life. It’s 2019 again, and it’s a strange time to be doing eCommerce
  • Everything else: OpenSea content moderation, AdTech data breaches, Yandex, the wisdom gap, TikTok, the state of crypto, jellyfish product management, and Veecon

✍ Commentary

The bundled warehouse. Enterprise data technologies move very slowly, but every so often there are new ideas that quickly supplant the old ways. This was the Customer Data Platform (CDP) back in 2017, billed as the technology to solve the unification, enrichment, and activation problem that siloed databases and services could not. It quickly overtook the Data Management Platform as the main way big brands make big data useable.

Today, you would be laughed out of the boardroom if you proposed a DMP as a viable analytics and customer identity solution. While the CDP has found a significant foothold in the buying decisions of CTOs, the technology is facing new challenges and questions about its role in the modern data stack. Right now the debate is focused on the CDP vs something called "reverse ETL."

The main point of a CDP is the P, it’s a platform. The CDP is several technologies built on top of a cloud database, with a bunch of API connectors and an overlay of analytics, enrichment, and activation tools.

CDPs exist because most enterprise businesses are running on legacy data technology that is horrendously difficult to integrate or use with newer generation marketing automation, analytics, and content management systems.

By bundling all the components needed to use customer data, the CDP became kind of like a cheat code for getting around the unnecessary complexity of trying to make old school systems do jobs they were never designed to do. But in so doing CDPs have a fatal flaw. In most cases, CDPs run on their own database, so you’re just adding another sibling to your already large and crowded database family.

CDPs get around old tech with more tech, often adding more and not less complexity to the data stack. More mouths to feed and more parenting of unruly data is not very fun.

Reverse ETL is a solution that has been making some noise in the past months as a direct challenge to the CDP. Mostly instigated by Hightouch, a data warehouse activation tool, the argument has been focused on the drawback of CDPs as a bundled solution. They cite inflexibility and lack of core product integration for identity resolution as the main ways CDPs fail to deliver value to customers.

In a lot of ways, Hightouch is right. Despite the growing appetite for Customer Data Platforms, by and large, the technology is mostly sitting on the shelf, not meeting the expectations of customers.

Instead, Hightouch argues that reverse ETL should be the new normal for customer data management. Assuming that all your apps are streaming data into a warehouse like Snowflake with a typical extract, transform, and load (ETL) methodology, reverse ETL copies your data from the warehouse and then sends it back to those apps or to other systems.

An example would be CRM data sent from Salesforce into a data warehouse, in Hightouch’s world you would then enrich or join that data using a service such as Snowplow and then send it back to Salesforce CRM using reverse ETL with updated information from other sources such as online interactions or email engagement.

This all seems fine until you realize that the majority of the world’s companies are still running on-premises servers that don’t work in the same way as their cloud counterparts. Say it with me: Enterprise data technology moves slowly.

The challenge to the idea of reverse ETL from the CEO of mParticle highlights the importance of understanding customer data maturity and needs. CDPs exist because enterprise-owned servers can’t do what the cloud can, and while in the grand scheme of things having two data warehouses running in parallel isn’t ideal, it is a way to get into the market with usable data to meet the needs of most use cases. A dollar made today, is better than a dollar promised tomorrow.  

However, using a data warehouse as a single source has advantages. Activation tools vary largely from analytics tools, so also enrichment and integration processes have unique challenges for every company. Decoupling these needs from a centralized platform like a CDP can help more mature and cloud-native companies to create an optimal environment to make data usable.

Like everything in enterprise technology, nothing is as simple as it seems. Rearchitecting the way companies manage, store, and process data is like watching the tectonic plates shift, it takes a long time and changes the foundations of business. It’s slow, arduous, and extremely high-risk work.

The only situations where reverse ETL is viable are with modern companies that have the majority of their data in the cloud and have engineering teams that actively work on integrating and enriching the data that goes into a data warehouse. The reality is that unstructured, and unclean customer data is more of a liability than an asset and in almost every big brand there is a lot of it.

mParticle makes a good point here that Snowflake has been successful in turning enterprise brands into major data hoarders, which only increases costs and creates unsustainable levels of electronic waste.

The debate about doing everything in a warehouse or in a CDP comes down to who you’re serving and what kinds of technological maturity exists. I’ve seen marketers pick up and learn the basics of a CDP by defining events, creating an audience, and sending that audience to another app. This, in its own right, is actually amazing. It’s a huge advantage for non-engineers to be able to manage and activate customer data and CDPs are one of the best tools out there that facilitate this kind of data ownership.

The warehouse will still need a layer of abstraction so that marketing, customer service, product, and analytics teams can make the data usable. Either way, the question worth answering is not about where the data lives, it’s about how it can be used by diverse teams with varying needs.

CDPs do this better because they are bundled. Proposing, as Hightouch does, that unbundling the CDP is a good thing is ludicrous. This effectively means that marketers will have to rely on even more people to learn, manage and execute all the various apps that will have to be plugged into a data warehouse. It's a recipe for more cooks in the kitchen.

Eventually, most enterprise brands will be renting their data from the cloud precisely because a bundled warehouse offers a combination of scalability, security, and integration that self-managed data environments cannot.

Both CDP and reverse ETL companies completely miss this point - if it’s Snowflake or AWS or Azure that is storing the data, then it’s going to be these companies that build the solutions to analyze, enrich, and activate customer data at scale. This is because offering another feature to an existing data warehouse is more valuable than adding a completely new data platform or attempting to fracture a data practice into many apps. The outlook on customer data management is less about unbundling the CDP as much as it will be bundling the warehouse. Links: HIGHTOUCH EXHIBIT: A & B.MPARTICLE. DIGIDAY. FORRESTER. UNBUNDLING THE DATABASE.CLOUD STATS.

📈Chart Of The Week  

The COVID-19 half-life. The pandemic was a forcing function for people to embrace eCommerce. But now that it’s subsiding, the growth curves are declining and are now back to 2019 adoption patterns. We’re now in this strange half-life where eCommerce is returning back to its pre-pandemic growth trends, but far more people are familiar with online experiences, which means a rapidly maturing market. This analysis calls it a “stagflation” Link

📰 Latest Developments

AdTech, RTB, and data breaches. The ICCL has released a report claiming that most RTB (real-time-bidding) ad auctioning systems are processing and passing people’s data more than 294 billion times a day in the US. ICCL are calling it a data breach, but this has been business as usual for years for the major AdTech players. There is no clear way to restrict RTB technology after data is broadcasted to various data partners which opens up a number of questions about data custody and ownership. Link

Google I/O. The annual conference is one of the major ways Google announces new tech products. A few key callouts include YouTube updates to automatically transcribe and segment content out of a video, multi-search functionality got a deep dive and the new ad center tool. There were also a number of hardware updates. Link

OpenSea does content moderation. The biggest success story from the Web3 movement has announced that it will now implement tools to take down fake NFTs and build greater moderation on the platform. This raises the question – if OpenSea is now a functional Web2 company, and if Web3 is about the promise of decentralization, then what’s the value proposition of OpenSea again? Hmmm. Link

📚 Reading

The strange state of Yandex. A profile on Russia’s largest tech company. Link

50 things about Martech. A great piece on forecasting some of the major changes to the industry over the next few years. Privacy and regulation will perhaps be the most important thing to happen to the industry since it began. Link

The wisdom gap. How technology decreases our sense-making abilities as complexity increases. There’s never been so much to reason through. Link

🔢 Data & Insights

The value of loyalty programs. About 50% of customers in a 2022 loyalty index report say that they get little value from their loyalty programs. A lot of what drives brands to create and maintain these programs has very little to do with making their customers’ lives better. Link

TikTok. Scott Galloway summarizes a number of important data points around the fastest-growing social app. TikTok was the most visited website in 2021, has 1.6 billion users, which is larger than Twitter, LinkedIn and Snapchat combined, and in 2021 delivered 22.6 trillion minutes of watched video content, more than twice that of Netflix. Compare this to Facebook, which recently reported the most popular page is literally a misinformation publication. Times are changing, and the center of social media innovation is shifting to China Links: TIKTOK. FB.

State of crypto. A16z did a big deep dive into the state of the crypto industry. The VC firm has one of the largest crypto/web funds, so there’s a lot of posturing in this piece. A few highlights include DeFi (decentralized finance) growing to a $100 billion market in less than two years, NFT creators made about $4 Billion in 2021 which averages out to $174k per creator. Mind you, most crypto projects have uneven concentrations of wealth creation which means that the web3 movement has been very successful in turning millionaires into billionaires. Link

💡 Ideas

Pakistani tech unicorn candidates.Link

The freedom-specificity tradeoff. Good piece from Other Life talking about different kinds of positioning consequences for content creators. Go niche and quickly build fandom, but then hit a ceiling. Go broad and be prepared for years of irrelevance until you tap into a much larger audience. Both are viable, but not easy. Link

Jellyfish Product Management. Marketing, product, and research are diagrammed to look like a sea creature. It kind of works….. Kind of. Link

✨ Weird and Wonderful

Veecon. Gary Vee’s big tent web3 conference. They had so many men and so few women attending that they turned the women’s bathroom into another men’s. Gary Vee is often ridiculed for his theatrics, but I can’t think of another influencer that can so effectively influence the mind of the 20-something tech bro. Link

The rules of civil conversation.Link

Toxic fandom. When a sad defamation trial builds a massive fan base. Link


Stay Curious,

Make sense of marketing technology.

Sign up now to get TMW delivered to your inbox every Sunday evening plus an invite to the slack community.


Want to share something interesting or be featured in The Martech Weekly? Drop me a line at juan@themartechweekly.com.

Juan Mendoza

Marketing technology strategist at The Lumery, I analyse marketing, data, and technology trends for some of the most well-known Australian and global brands.

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.