How to build a Marketing Data Hub — The Modern Data Stack for Marketing

Krisjan Oldekamp— 

The Modern Data Stack; a cloud data warehouse connected to various, specialized, Software-as-a-Service (SaaS) tools: the foundation for a Marketing Data Hub. Integrating multiple data sources into a data warehouse or operationalizing the data in marketing channels or other systems (Composable CDP). But where do you start building a modern data stack for marketing, and how can marketing benefit from it?

The modern data stack consists of various best-of-breed solutions, each providing a piece of functionality. Think of it as a modular system where you can relatively easily add or replace building blocks.

The data and marketing 'technology stack' are blending together.

Within the modern data stack, it's not just about collecting data in a central database (data warehouse) so that only analysts or data scientists can do their work. Data democratization - giving everyone in the organization access to reliable data - and thereby being able to operationalize the data, are increasingly becoming an integral part of the system (the stack).

Image: From insights to action

Data from the warehouse can operationalized in other systems (such as marketing channels or CRM systems). So, data doesn't just go into the warehouse, but also comes out. For instance, there are tools that allow non-technical users to easily assemble target groups and automatically share them with a marketing automation tool or Google Ads. All of this is based on the data already present in the data warehouse. This development is now known as the Composable Customer Data Platform (CDP).

Image: Screenshot of Hightouch (Reverse ETL): Marketers can build audiences based on data in the data ware house / Marketing Hub.

Also, components such as data transformation, governance, and self-service business intelligence are, in most cases, a fixed part of the Modern Data Stack. The fact is that the data and marketing technology stacks are slowly blending together...

The rise of the Modern Data Stack

Several of the developments that led to the rise of the modern data stack:

  1. The shift from on-premise to cloud-native data warehouses (e.g. analytical databases like Amazon Redshift, Snowflake, or Google BigQuery).

  2. The rise of SaaS tools and services within cloud environments that have a modular approach (building blocks)

  3. The emergence of self-service business intelligence tools to make (reliable) data available to everyone within an organization (data democratization).

The great advantage of the modern data stack is that it has become less complex to set up such a system. You don't need an army of data engineers to stack the building blocks on top of each other. However, this doesn't mean the role of the data engineer has become redundant, quite the opposite. It can make the life of the data engineer easier and more productive.

Thanks to tools that can be integrated into the stack, end users themselves can work with the data. This includes both gaining insights and operationalizing the data. This applies not only for analysts but also marketing, sales, or operations teams.

Another plus is that the system is modularly built, and costs are often dependent on what you use. For both small and large organizations, a modern data stack is highly suitable: it can be expanded when necessary, in both scale and functionality. But what components make up such a modern data stack, and where do you start building?

The components of a Modern Data Stack

You can divide the stack into the following main categories, which I will explain step by step:

  1. Data sources & integration

  2. Data storage and transformation (ELT)

  3. Data analysis and visualizatiion

  4. Data activation (operations)

  5. Data governance and Observability

Image: The modern (marketing)-data stack: 5 components

For each component, I randomly mention a number of vendors. Of course, there are more options; the overviews from Castor or Snowplow provide a more comprehensive view of all available SaaS tools.

1. Data integration: importing sources to the data warehouse

As mentioned, it has become much simpler to connect all the various data sources within your organization to the data warehouse, without the need for an army of IT specialists and data engineers. This doesn't mean that a data engineer is any less valuable.

The tools available in the market often contain 100 or more ready-made connectors. Think of pulling in marketing performance data from Facebook, Google, or LinkedIn, as well as CRM data from Hubspot, Salesforce, or Shopify. All of this can be configured without much technical expertise. However, it's advisable to have someone knowledgeable oversee the setup. Some popular providers of data integration tools are:

  • Fivetran

  • Stitch

  • Adverity

  • Airbyte (open-source)

  • Matillion

  • Keboola

These tools ultimately pull data to the analytical database in the cloud data warehouse.

Not just customer data..

Given that we drop the Composable Customer Data Platforms (CDPs) concept, one might get the impression that only customer data is imported. However, this is absolutely not the case. Examples include performance data from marketing channels, product and pricing data, socio-demographic data, weather data, etc.

2. Data storage & transformation

The centerpiece of modern data stack is the (cloud-native) analytical database. These types of databases are capable of rapidly analyzing vast amounts of data. The most well-known products are:

  • Snowflake

  • Google BigQuery

  • Amazon Redshift

  • Azure Synapse / Fabric

Almost all data integration tools mentioned in the previous section support the above databases as their destination. From experience, we know that choosing between different databases can be challenging, although their capabilities aren't vastly different. For an excellent comparison, I recommend checking out this article by Rogier Werschkull.

To ensure data remains consistent and usable throughout the process, it's essential to transform the data (think cleaning and merging). Since cloud-native analytical databases are so powerful, it's entirely feasible to load untransformed data (as-is) into the data warehouse and tackle data transformation as a subsequent step. This shifts from ETL to ELT (Extract, Transform, Load to Extract, Load, Transform). This changes the functionality expected from the data integration tools discussed in the previous chapter. All data transformation is then centralized rather than being partly in the integration process and partly in the data warehouse.

Popular data transformation tools that makes life easier for data engineers and analysts are dbt and Google's Dataform.

3. Data analysis: Towards a ‘Self-Service’ Model

Making data accessible to everyone within an organization is crucial for cultivating a data-driven culture. Hence, even without technical knowledge like SQL, retrieving desired data should be straightforward. It's therefore essential to choose a modern business intelligence tool that integrates with cloud-native databases. Most tools now offer self-service functionality, enabling end users to resolve their inquiries using data. This could involve easily exploring data sources or delving deeper into existing reports. Renowned providers include:

  • Looker / Looker Studio

  • Tableau

  • PowerBI

  • Qlik

  • Apache Superset (open source)

  • Metabase (open source)

4. Data Operations: Activate Data from the Data Warehouse

Wouldn't it be great not only to analyze but also to (automatically) utilize the data in, for instance, your marketing channels or CRM? Effectively becoming a part of the marketing-technology stack?

Around 2021, tools designed to operationalize data from a data warehouse became more and more popular. This is often referred to as Reverse ETL. As the name suggests, Reverse ETL does the exact opposite of ETL (Extract, Transform, Load): it reintroduces data from your data warehouse back to its source or other systems.

This doesn't just benefit marketing but also sectors like sales or HR. Automatically create Google Ads or Facebook audiences, enriching insights in a CRM system (360º customer view), or personalizing email campaigns. Popular tools:

  • Census

  • Hightouch

From Insight to Action

Image: Marketing Data Hub: From Insights To Action

The foundation for a Composable CDP

Reverse ETL essentially positions the modern data stack to compete with stand-alone Customer Data Platforms (CDPs). One could view the Marketing Data Hub as the foundation for a Composable CDP.

5. Data governance and Observability

The final but by no means the least important step: data governance and data quality. Trust in the data is, in fact, one of the most crucial components for the entire system to succeed. This is a category where a lot of development is currently happening, and it is expected to soar in 2022. Broadly, there are two types of tools in this category: data catalogs and data quality & observability.

Data catalogs

How can you tell from which data source a specific dashboard metric originates? Or conversely, if table X changes, which dashboards or systems might be potentially affected? Data catalogs help map this out. These tools link various data sources, tables, dashboards, or other data creators and users (data lineage). They often also provide documentation capabilities and a search function. Some tools include:

  • Atlan

  • Apache Atlas

  • Amundsen

  • DataHub

Data quality & observability

Besides insights and documentation, different data sources can also be automatically monitored. This can be either based on machine learning that detects data anomalies or manually added checks (like missing essential fields or data updated within the last 24 hours). Upon discrepancies, alerts can be dispatched, for instance, through email or Slack. Tools include:

  • Monte Carlo

  • Soda

  • Elementary (for dbt)

Start with a business case

Our top advice: start with a business case. The Modern Data Stack is perfect for this approach. Not every source needs to be connected right from the start. Begin with one or a few business cases and expand from there. This way, you can quickly create value and support. Demonstrate that it's not just a tech matter. With the right setup, the output should be easily accessible to the entire organization in terms of insights and operationalizing data within marketing but also other departments.

Initially, key fundamental decisions will have to be made, but expansion can be gradual afterward. Engage the right people and ensure the necessary expertise is on hand to establish a solid foundation.

Stay up-to-date!

Did you like this article? Subscribe and get notified when new articles or other interesting updates are published. No spam and you can unsubscribe at any time.

In good company

Postcode Lottery Group
New York Pizza
Talpa Ecommerce
Fashion Cloud