How to build a Marketing Data Hub — The Modern Data Stack for Marketing

The modern data stack consists of various best-of-breed solutions, each providing a piece of functionality. Think of it as a modular system where you can relatively easily add or replace building blocks.

The data and marketing 'technology stack' are blending together.

From insights to action

Data from the warehouse can operationalized in other systems (such as marketing channels or CRM systems). So, data doesn't just go into the warehouse, but also comes out. For instance, there are tools that allow non-technical users to easily assemble target groups and automatically share them with a marketing automation tool or Google Ads. All of this is based on the data already present in the data warehouse. This development is now known as the Composable Customer Data Platform (CDP).

Screenshot of Hightouch (Reverse ETL): Marketers can build audiences based on data in the data ware house / Marketing Hub.

Also, components such as data transformation, governance, and self-service business intelligence are, in most cases, a fixed part of the Modern Data Stack. The fact is that the data and marketing technology stacks are slowly blending together...

The rise of the Modern Data Stack

Several of the developments that led to the rise of the modern data stack:

The shift from on-premise to cloud-native data warehouses (e.g. analytical databases like Amazon Redshift, Snowflake, or Google BigQuery).
The rise of SaaS tools and services within cloud environments that have a modular approach (building blocks)
The emergence of self-service business intelligence tools to make (reliable) data available to everyone within an organization (data democratization).

The great advantage of the modern data stack is that it has become less complex to set up such a system. You don't need an army of data engineers to stack the building blocks on top of each other. However, this doesn't mean the role of the data engineer has become redundant, quite the opposite. It can make the life of the data engineer easier and more productive.

Thanks to tools that can be integrated into the stack, end users themselves can work with the data. This includes both gaining insights and operationalizing the data. This applies not only for analysts but also marketing, sales, or operations teams.

Another plus is that the system is modularly built, and costs are often dependent on what you use. For both small and large organizations, a modern data stack is highly suitable: it can be expanded when necessary, in both scale and functionality. But what components make up such a modern data stack, and where do you start building?

The components of a Modern Data Stack

You can divide the stack into the following main categories, which I will explain step by step:

Data sources & integration
Data storage and transformation (ELT)
Data analysis and visualizatiion
Data activation (operations)
Data governance and Observability

The modern (marketing)-data stack: 5 components

1. Data integration: importing sources to the data warehouse

As mentioned, it has become much simpler to connect all the various data sources within your organization to the data warehouse, without the need for an army of IT specialists and data engineers. This doesn't mean that a data engineer is any less valuable.

The tools available in the market often contain 100 or more ready-made connectors. Think of pulling in marketing performance data from Facebook, Google, or LinkedIn, as well as CRM data from Hubspot, Salesforce, or Shopify. All of this can be configured without much technical expertise. However, it's advisable to have someone knowledgeable oversee the setup. Some popular providers of data integration tools are:

Fivetran
Stitch
Adverity
Airbyte (open-source)
Matillion
Keboola

These tools ultimately pull data to the analytical database in the cloud data warehouse.

[DatoCMS Block #197444006]

2. Data storage & transformation

The centerpiece of modern data stack is the (cloud-native) analytical database. These types of databases are capable of rapidly analyzing vast amounts of data. The most well-known products are:

Snowflake
Google BigQuery
Amazon Redshift
Azure Synapse / Fabric

To ensure data remains consistent and usable throughout the process, it's essential to transform the data (think cleaning and merging). Since cloud-native analytical databases are so powerful, it's entirely feasible to load untransformed data (as-is) into the data warehouse and tackle data transformation as a subsequent step. This shifts from ETL to ELT (Extract, Transform, Load to Extract, Load, Transform). This changes the functionality expected from the data integration tools discussed in the previous chapter. All data transformation is then centralized rather than being partly in the integration process and partly in the data warehouse.

3. Data analysis: Towards a ‘Self-Service’ Model

Making data accessible to everyone within an organization is crucial for cultivating a data-driven culture. Hence, even without technical knowledge like SQL, retrieving desired data should be straightforward. It's therefore essential to choose a modern business intelligence tool that integrates with cloud-native databases. Most tools now offer self-service functionality, enabling end users to resolve their inquiries using data. This could involve easily exploring data sources or delving deeper into existing reports. Renowned providers include:

Looker / Looker Studio
Tableau
PowerBI
Qlik
Apache Superset (open source)
Metabase (open source)

4. Data Operations: Activate Data from the Data Warehouse

Wouldn't it be great not only to analyze but also to (automatically) utilize the data in, for instance, your marketing channels or CRM? Effectively becoming a part of the marketing-technology stack?

Around 2021, tools designed to operationalize data from a data warehouse became more and more popular. This is often referred to as Reverse ETL. As the name suggests, Reverse ETL does the exact opposite of ETL (Extract, Transform, Load): it reintroduces data from your data warehouse back to its source or other systems.

This doesn't just benefit marketing but also sectors like sales or HR. Automatically create Google Ads or Facebook audiences, enriching insights in a CRM system (360º customer view), or personalizing email campaigns. Popular tools:

Census
Hightouch

From Insight to Action

Marketing Data Hub: From Insights To Action

5. Data governance and Observability

The final but by no means the least important step: data governance and data quality. Trust in the data is, in fact, one of the most crucial components for the entire system to succeed. This is a category where a lot of development is currently happening, and it is expected to soar in 2022. Broadly, there are two types of tools in this category: data catalogs and data quality & observability.

Data catalogs

How can you tell from which data source a specific dashboard metric originates? Or conversely, if table X changes, which dashboards or systems might be potentially affected? Data catalogs help map this out. These tools link various data sources, tables, dashboards, or other data creators and users (data lineage). They often also provide documentation capabilities and a search function. Some tools include:

Atlan
Apache Atlas
Amundsen
DataHub

Data quality & observability

Besides insights and documentation, different data sources can also be automatically monitored. This can be either based on machine learning that detects data anomalies or manually added checks (like missing essential fields or data updated within the last 24 hours). Upon discrepancies, alerts can be dispatched, for instance, through email or Slack. Tools include:

Monte Carlo
Soda
Elementary (for dbt)

Start with a business case

Our top advice: start with a business case. The Modern Data Stack is perfect for this approach. Not every source needs to be connected right from the start. Begin with one or a few business cases and expand from there. This way, you can quickly create value and support. Demonstrate that it's not just a tech matter. With the right setup, the output should be easily accessible to the entire organization in terms of insights and operationalizing data within marketing but also other departments.

Initially, key fundamental decisions will have to be made, but expansion can be gradual afterward. Engage the right people and ensure the necessary expertise is on hand to establish a solid foundation.